Close Menu
  • Home
  • AI
  • Big Data
  • Cloud Computing
  • iOS Development
  • IoT
  • IT/ Cybersecurity
  • Tech
    • Nanotechnology
    • Green Technology
    • Apple
    • Software Development
    • Software Engineering

Subscribe to Updates

Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

    What's Hot

    ios – Get notification history paginationToken and startTime inconsistency

    February 9, 2026

    T-glass, a type of ultrathin glass sheet used in advanced chips, is in short supply and largely comes from Nittobo, which is not adding capacity for months (Yang Jie/Wall Street Journal)

    February 8, 2026

    New data sources and spark_apply() capabilities, better interfaces for sparklyr extensions, and more!

    February 8, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Big Tee Tech Hub
    • Home
    • AI
    • Big Data
    • Cloud Computing
    • iOS Development
    • IoT
    • IT/ Cybersecurity
    • Tech
      • Nanotechnology
      • Green Technology
      • Apple
      • Software Development
      • Software Engineering
    Big Tee Tech Hub
    Home»Artificial Intelligence»A sparklyr extension for analyzing geospatial data
    Artificial Intelligence

    A sparklyr extension for analyzing geospatial data

    big tee tech hubBy big tee tech hubJanuary 31, 2026025 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email Telegram WhatsApp
    Follow Us
    Google News Flipboard
    A sparklyr extension for analyzing geospatial data
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    sparklyr.sedona is now available
    as the sparklyr-based R interface for Apache Sedona.

    To install sparklyr.sedona from GitHub using
    the remotes package
    , run

    remotes::install_github(repo = "apache/incubator-sedona", subdir = "R/sparklyr.sedona")

    In this blog post, we will provide a quick introduction to sparklyr.sedona, outlining the motivation behind
    this sparklyr extension, and presenting some example sparklyr.sedona use cases involving Spark spatial RDDs,
    Spark dataframes, and visualizations.

    Motivation for sparklyr.sedona

    A suggestion from the
    mlverse survey results earlier
    this year mentioned the need for up-to-date R interfaces for Spark-based GIS frameworks.
    While looking into this suggestion, we learned about
    Apache Sedona, a geospatial data system powered by Spark
    that is modern, efficient, and easy to use. We also realized that while our friends from the
    Spark open-source community had developed a
    sparklyr extension for GeoSpark, the
    predecessor of Apache Sedona, there was no similar extension making more recent Sedona
    functionalities easily accessible from R yet.
    We therefore decided to work on sparklyr.sedona, which aims to bridge the gap between
    Sedona and R.

    The lay of the land

    We hope you are ready for a quick tour through some of the RDD-based and
    Spark-dataframe-based functionalities in sparklyr.sedona, and also, some bedazzling
    visualizations derived from geospatial data in Spark.

    In Apache Sedona,
    Spatial Resilient Distributed Datasets(SRDDs)
    are basic building blocks of distributed spatial data encapsulating
    “vanilla” RDDs of
    geometrical objects and indexes. SRDDs support low-level operations such as Coordinate Reference System (CRS)
    transformations, spatial partitioning, and spatial indexing. For example, with sparklyr.sedona, SRDD-based operations we can perform include the following:

    • Importing some external data source into a SRDD:
    library(sparklyr)
    library(sparklyr.sedona)
    
    sedona_git_repo <- normalizePath("~/incubator-sedona")
    data_dir <- file.path(sedona_git_repo, "core", "src", "test", "resources")
    
    sc <- spark_connect(master = "local")
    
    pt_rdd <- sedona_read_dsv_to_typed_rdd(
      sc,
      location = file.path(data_dir, "arealm.csv"),
      type = "point"
    )
    • Applying spatial partitioning to all data points:
    sedona_apply_spatial_partitioner(pt_rdd, partitioner = "kdbtree")
    • Building spatial index on each partition:
    sedona_build_index(pt_rdd, type = "quadtree")
    • Joining one spatial data set with another using “contain” or “overlap” as the join predicate:
    polygon_rdd <- sedona_read_dsv_to_typed_rdd(
      sc,
      location = file.path(data_dir, "primaryroads-polygon.csv"),
      type = "polygon"
    )
    
    pts_per_region_rdd <- sedona_spatial_join_count_by_key(
      pt_rdd,
      polygon_rdd,
      join_type = "contain",
      partitioner = "kdbtree"
    )

    It is worth mentioning that sedona_spatial_join() will perform spatial partitioning
    and indexing on the inputs using the partitioner and index_type only if the inputs
    are not partitioned or indexed as specified already.

    From the examples above, one can see that SRDDs are great for spatial operations requiring
    fine-grained control, e.g., for ensuring a spatial join query is executed as efficiently
    as possible with the right types of spatial partitioning and indexing.

    Finally, we can try visualizing the join result above, using a choropleth map:

    sedona_render_choropleth_map(
      pts_per_region_rdd,
      resolution_x = 1000,
      resolution_y = 600,
      output_location = tempfile("choropleth-map-"),
      boundary = c(-126.790180, -64.630926, 24.863836, 50.000),
      base_color = c(63, 127, 255)
    )

    which gives us the following:

    Example choropleth map output
    Example choropleth map output

    Wait, but something seems amiss. To make the visualization above look nicer, we can
    overlay it with the contour of each polygonal region:

    contours <- sedona_render_scatter_plot(
      polygon_rdd,
      resolution_x = 1000,
      resolution_y = 600,
      output_location = tempfile("scatter-plot-"),
      boundary = c(-126.790180, -64.630926, 24.863836, 50.000),
      base_color = c(255, 0, 0),
      browse = FALSE
    )
    
    sedona_render_choropleth_map(
      pts_per_region_rdd,
      resolution_x = 1000,
      resolution_y = 600,
      output_location = tempfile("choropleth-map-"),
      boundary = c(-126.790180, -64.630926, 24.863836, 50.000),
      base_color = c(63, 127, 255),
      overlay = contours
    )

    which gives us the following:

    Choropleth map with overlay
    Choropleth map with overlay

    With some low-level spatial operations taken care of using the SRDD API and
    the right spatial partitioning and indexing data structures, we can then
    import the results from SRDDs to Spark dataframes. When working with spatial
    objects within Spark dataframes, we can write high-level, declarative queries
    on these objects using dplyr verbs in conjunction with Sedona
    spatial UDFs, e.g.

    , the
    following query tells us whether each of the 8 nearest polygons to the
    query point contains that point, and also, the convex hull of each polygon.

    tbl <- DBI::dbGetQuery(
      sc, "SELECT ST_GeomFromText(\"POINT(-66.3 18)\") AS `pt`"
    )
    pt <- tbl$pt[[1]]
    knn_rdd <- sedona_knn_query(
      polygon_rdd, x = pt, k = 8, index_type = "rtree"
    )
    
    knn_sdf <- knn_rdd %>%
      sdf_register() %>%
      dplyr::mutate(
        contains_pt = ST_contains(geometry, ST_Point(-66.3, 18)),
        convex_hull = ST_ConvexHull(geometry)
      )
    
    knn_sdf %>% print()
    # Source: spark> [?? x 3]
      geometry                         contains_pt convex_hull
                                        
    1 

    Acknowledgements

    The author of this blog post would like to thank Jia Yu,
    the creator of Apache Sedona, and Lorenz Walthert for
    their suggestion to contribute sparklyr.sedona to the upstream
    incubator-sedona repository. Jia has provided
    extensive code-review feedback to ensure sparklyr.sedona complies with coding standards
    and best practices of the Apache Sedona project, and has also been very helpful in the
    instrumentation of CI workflows verifying sparklyr.sedona works as expected with snapshot
    versions of Sedona libraries from development branches.

    The author is also grateful for his colleague Sigrid Keydana
    for valuable editorial suggestions on this blog post.

    That’s all. Thank you for reading!

    Photo by NASA on Unsplash

    Enjoy this blog? Get notified of new posts by email:

    Posts also available at r-bloggers

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.

    Citation

    For attribution, please cite this work as

    Li (2021, July 7). Posit AI Blog: sparklyr.sedona: A sparklyr extension for analyzing geospatial data. Retrieved from 

    BibTeX citation

    @misc{sparklyr-sedona,
      author = {Li, Yitao},
      title = {Posit AI Blog: sparklyr.sedona: A sparklyr extension for analyzing geospatial data},
      url = {},
      year = {2021}
    }



    Source link

    Analyzing Data extension geospatial sparklyr
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    tonirufai
    big tee tech hub
    • Website

    Related Posts

    New data sources and spark_apply() capabilities, better interfaces for sparklyr extensions, and more!

    February 8, 2026

    Moltbook was peak AI theater

    February 7, 2026

    Reverse Engineering Your Software Architecture with Claude Code to Help Claude Code – O’Reilly

    February 6, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Editors Picks

    ios – Get notification history paginationToken and startTime inconsistency

    February 9, 2026

    T-glass, a type of ultrathin glass sheet used in advanced chips, is in short supply and largely comes from Nittobo, which is not adding capacity for months (Yang Jie/Wall Street Journal)

    February 8, 2026

    New data sources and spark_apply() capabilities, better interfaces for sparklyr extensions, and more!

    February 8, 2026

    Fake Dubai Crown Prince tracked to Nigerian mansion after $2.5M romance scam

    February 8, 2026
    About Us
    About Us

    Welcome To big tee tech hub. Big tee tech hub is a Professional seo tools Platform. Here we will provide you only interesting content, which you will like very much. We’re dedicated to providing you the best of seo tools, with a focus on dependability and tools. We’re working to turn our passion for seo tools into a booming online website. We hope you enjoy our seo tools as much as we enjoy offering them to you.

    Don't Miss!

    ios – Get notification history paginationToken and startTime inconsistency

    February 9, 2026

    T-glass, a type of ultrathin glass sheet used in advanced chips, is in short supply and largely comes from Nittobo, which is not adding capacity for months (Yang Jie/Wall Street Journal)

    February 8, 2026

    Subscribe to Updates

    Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

      • About Us
      • Contact Us
      • Disclaimer
      • Privacy Policy
      • Terms and Conditions
      © 2026 bigteetechhub.All Right Reserved

      Type above and press Enter to search. Press Esc to cancel.