| Title: | Bindings for Apache SedonaDB |
|---|---|
| Description: | Provides bindings for Apache SedonaDB, a lightweight query engine optimized for spatial workflows. |
| Authors: | Dewey Dunnington [aut, cre] |
| Maintainer: | Dewey Dunnington <[email protected]> |
| License: | Apache License (>= 2) |
| Version: | 0.3.0 |
| Built: | 2026-03-09 16:22:30 UTC |
| Source: | https://github.com/apache/sedona-db |
This object is an escape hatch for calling SedonaDB/DataFusion functions directly for translations that are not yet registered or are otherwise misbehaving.
.fns.fns
An object of class sedonadb_fns of length 0.
Convert an object to a DataFrame
as_sedonadb_dataframe(x, ..., schema = NULL, ctx = NULL)as_sedonadb_dataframe(x, ..., schema = NULL, ctx = NULL)
x |
An object to convert |
... |
Extra arguments passed to/from methods |
schema |
The requested schema |
ctx |
A SedonaDB context. This should always be passed to inner calls to SedonaDB functions; NULL implies the global context. |
A sedonadb_dataframe
as_sedonadb_dataframe(data.frame(x = 1:3))as_sedonadb_dataframe(data.frame(x = 1:3))
This generic provides the opportunity for objects to register a mechanism
to be understood as literals in the context of a SedonaDB expression.
Users constructing expressions directly should use sd_expr_literal().
as_sedonadb_literal(x, ..., type = NULL, factory = NULL)as_sedonadb_literal(x, ..., type = NULL, factory = NULL)
x |
An object to convert to a SedonaDB literal |
... |
Passed to/from methods |
type |
An optional data type to request for the output |
factory |
An |
An object of class SedonaDBExpr
as_sedonadb_literal("abcd")as_sedonadb_literal("abcd")
Order rows of a SedonaDB data frame using column values
sd_arrange(.data, ...)sd_arrange(.data, ...)
.data |
A sedonadb_dataframe or an object that can be coerced to one. |
... |
Unnamed expressions for arrange expressions. These are evaluated
in the same way as |
An object of class sedonadb_dataframe
data.frame(x = c(10:1, NA)) |> sd_arrange(x) data.frame(x = c(1:10, NA)) |> sd_arrange(desc(x))data.frame(x = c(10:1, NA)) |> sd_arrange(x) data.frame(x = c(1:10, NA)) |> sd_arrange(desc(x))
Use sd_compute() to collect and return the result as a DataFrame;
use sd_collect() to collect and return the result as an R data.frame.
sd_compute(.data) sd_collect(.data, ptype = NULL)sd_compute(.data) sd_collect(.data, ptype = NULL)
.data |
A sedonadb_dataframe or an object that can be coerced to one. |
ptype |
The target R object. See nanoarrow::convert_array_stream. |
sd_compute() returns a sedonadb_dataframe; sd_collect() returns
a data.frame (or subclass according to ptype).
sd_sql("SELECT 1 as one") |> sd_compute() sd_sql("SELECT 1 as one") |> sd_collect()sd_sql("SELECT 1 as one") |> sd_compute() sd_sql("SELECT 1 as one") |> sd_collect()
Performs a runtime configuration of PROJ, which can be used in place of a build-time linked version of PROJ or to add in support if PROJ was not linked at build time.
sd_configure_proj( preset = NULL, shared_library = NULL, database_path = NULL, search_path = NULL )sd_configure_proj( preset = NULL, shared_library = NULL, database_path = NULL, search_path = NULL )
preset |
One of:
|
shared_library |
An absolute or relative path to a shared library valid for the platform. |
database_path |
A path to proj.db |
search_path |
A path to the data files required by PROJ for some transforms. |
NULL, invisibly
sd_configure_proj("auto")sd_configure_proj("auto")
Runtime options configure the execution environment. Use
global = TRUE to configure the global context or use the
returned object as a scoped context. A scoped context is
recommended for programmatic usage as it prevents named
views from interfering with each other.
sd_connect( ..., global = FALSE, memory_limit = NULL, temp_dir = NULL, memory_pool_type = NULL, unspillable_reserve_ratio = NULL )sd_connect( ..., global = FALSE, memory_limit = NULL, temp_dir = NULL, memory_pool_type = NULL, unspillable_reserve_ratio = NULL )
... |
Reserved for future options |
global |
Use TRUE to set options on the global context. |
memory_limit |
Maximum memory for query execution, as a
human-readable string (e.g., |
temp_dir |
Directory for temporary/spill files, or |
memory_pool_type |
Memory pool type: |
unspillable_reserve_ratio |
Fraction of memory (0–1) reserved for
unspillable consumers. Only applies when |
The constructed context, invisibly.
sd_connect(memory_limit = "100mb", memory_pool_type = "fair")sd_connect(memory_limit = "100mb", memory_pool_type = "fair")
Count rows in a DataFrame
sd_count(.data)sd_count(.data)
.data |
A sedonadb_dataframe or an object that can be coerced to one. |
The number of rows after executing the query
sd_sql("SELECT 1 as one") |> sd_count()sd_sql("SELECT 1 as one") |> sd_count()
Remove a view created with sd_to_view() from the context.
sd_drop_view(table_ref) sd_ctx_drop_view(ctx, table_ref) sd_view(table_ref) sd_ctx_view(ctx, table_ref)sd_drop_view(table_ref) sd_ctx_drop_view(ctx, table_ref) sd_view(table_ref) sd_ctx_view(ctx, table_ref)
table_ref |
The name of the view reference |
ctx |
A SedonaDB context. |
The context, invisibly
sd_sql("SELECT 1 as one") |> sd_to_view("foofy") sd_view("foofy") sd_drop_view("foofy") try(sd_view("foofy"))sd_sql("SELECT 1 as one") |> sd_to_view("foofy") sd_view("foofy") sd_drop_view("foofy") try(sd_view("foofy"))
Create SedonaDB logical expressions
sd_expr_column(column_name, qualifier = NULL, factory = sd_expr_factory()) sd_expr_literal(x, type = NULL, factory = sd_expr_factory()) sd_expr_binary(op, lhs, rhs, factory = sd_expr_factory()) sd_expr_negative(expr, factory = sd_expr_factory()) sd_expr_any_function( function_name, args, ..., na.rm = NULL, factory = sd_expr_factory() ) sd_expr_scalar_function(function_name, args, factory = sd_expr_factory()) sd_expr_aggregate_function( function_name, args, ..., na.rm = FALSE, distinct = FALSE, factory = sd_expr_factory() ) sd_expr_cast(expr, type, factory = sd_expr_factory()) sd_expr_alias(expr, alias, factory = sd_expr_factory()) as_sd_expr(x, factory = sd_expr_factory()) is_sd_expr(x) sd_expr_factory(ctx = NULL)sd_expr_column(column_name, qualifier = NULL, factory = sd_expr_factory()) sd_expr_literal(x, type = NULL, factory = sd_expr_factory()) sd_expr_binary(op, lhs, rhs, factory = sd_expr_factory()) sd_expr_negative(expr, factory = sd_expr_factory()) sd_expr_any_function( function_name, args, ..., na.rm = NULL, factory = sd_expr_factory() ) sd_expr_scalar_function(function_name, args, factory = sd_expr_factory()) sd_expr_aggregate_function( function_name, args, ..., na.rm = FALSE, distinct = FALSE, factory = sd_expr_factory() ) sd_expr_cast(expr, type, factory = sd_expr_factory()) sd_expr_alias(expr, alias, factory = sd_expr_factory()) as_sd_expr(x, factory = sd_expr_factory()) is_sd_expr(x) sd_expr_factory(ctx = NULL)
column_name |
A column name |
qualifier |
An optional qualifier (e.g., table reference) that may be used to disambiguate a specific reference |
factory |
A |
x |
An object to convert to a SedonaDB literal (constant). |
type |
A destination type into which |
op |
Operator name for a binary expression. In general these follow
R function names (e.g., |
lhs, rhs
|
Arguments to a binary expression |
expr |
A SedonaDBExpr or object coercible to one with |
function_name |
The name of the function to call. This name is resolved
from the context associated with |
args |
A list of SedonaDBExpr or object coercible to one with
|
... |
Reserved for future use |
na.rm |
For aggregate expressions, should nulls be ignored? The R
idiom is to respect null; however, the SQL idiom is to drop them. The
default value follows the R idiom ( |
distinct |
For aggregate expressions, use only distinct values. |
alias |
An alias to apply to |
ctx |
A SedonaDB context or NULL to use the default context. |
An object of class SedonaDBExpr
sd_expr_column("foofy") sd_expr_literal(1L) sd_expr_scalar_function("abs", list(1L)) sd_expr_cast(1L, nanoarrow::na_int64()) sd_expr_alias(1L, "foofy")sd_expr_column("foofy") sd_expr_literal(1L) sd_expr_scalar_function("abs", list(1L)) sd_expr_cast(1L, nanoarrow::na_int64()) sd_expr_alias(1L, "foofy")
Keep rows of a SedonaDB DataFrame that match a condition
sd_filter(.data, ...)sd_filter(.data, ...)
.data |
A sedonadb_dataframe or an object that can be coerced to one. |
... |
Unnamed expressions for filter conditions. These are evaluated
in the same way as |
An object of class sedonadb_dataframe
data.frame(x = 1:10) |> sd_filter(x > 5)data.frame(x = 1:10) |> sd_filter(x > 5)
Note that unlike dplyr::group_by(), these groups are dropped after
any transformations.
sd_group_by(.data, ...) sd_ungroup(.data)sd_group_by(.data, ...) sd_ungroup(.data)
.data |
A sedonadb_dataframe or an object that can be coerced to one. |
... |
Named expressions whose unique combination will be used as
groups to potentially compute a future aggregate expression. These are
evaluated in the same way as |
An object of class sedonadb_dataframe
data.frame(letter = c(rep("a", 3), rep("b", 4), rep("c", 3)), x = 1:10) |> sd_group_by(letter) |> sd_summarise(x = sum(x))data.frame(letter = c(rep("a", 3), rep("b", 4), rep("c", 3)), x = 1:10) |> sd_group_by(letter) |> sd_summarise(x = sum(x))
This is used to implement print() for the sedonadb_dataframe or can
be used to explicitly preview if options(sedonadb.interactive = FALSE).
sd_preview(.data, n = NULL, ascii = NULL, width = NULL)sd_preview(.data, n = NULL, ascii = NULL, width = NULL)
.data |
A sedonadb_dataframe or an object that can be coerced to one. |
n |
The number of rows to preview. Use |
ascii |
Use |
width |
The character width of the output. Defaults to
|
.data, invisibly
sd_sql("SELECT 1 as one") |> sd_preview()sd_sql("SELECT 1 as one") |> sd_preview()
The query will only be executed when requested.
sd_read_parquet(path) sd_ctx_read_parquet(ctx, path)sd_read_parquet(path) sd_ctx_read_parquet(ctx, path)
path |
One or more paths or URIs to Parquet files |
ctx |
A SedonaDB context. |
A sedonadb_dataframe
path <- system.file("files/natural-earth_cities_geo.parquet", package = "sedonadb") sd_read_parquet(path) |> head(5) |> sd_preview()path <- system.file("files/natural-earth_cities_geo.parquet", package = "sedonadb") sd_read_parquet(path) |> head(5) |> sd_preview()
Uses the ArrowArrayStream interface to GDAL exposed via the sf package to read GDAL/OGR-based data sources.
sd_read_sf( dsn, layer = NULL, ..., query = NA, options = NULL, drivers = NULL, filter = NULL, fid_column_name = NULL, lazy = FALSE ) sd_ctx_read_sf( ctx, dsn, layer = NULL, ..., query = NA, options = NULL, drivers = NULL, filter = NULL, fid_column_name = NULL, lazy = FALSE )sd_read_sf( dsn, layer = NULL, ..., query = NA, options = NULL, drivers = NULL, filter = NULL, fid_column_name = NULL, lazy = FALSE ) sd_ctx_read_sf( ctx, dsn, layer = NULL, ..., query = NA, options = NULL, drivers = NULL, filter = NULL, fid_column_name = NULL, lazy = FALSE )
dsn, layer
|
Description of datasource and layer. See |
... |
Currently unused and must be empty |
query |
A SQL query to pass on to GDAL/OGR. |
options |
A character vector with layer open options in the form "KEY=VALUE". |
drivers |
A list of drivers to try if the dsn cannot be guessed. |
filter |
A spatial object that may be used to filter while reading.
In the future SedonaDB will automatically calculate this value based on
the query. May be any spatial object that can be converted to WKT via
|
fid_column_name |
An optional name for the feature id (FID) column. |
lazy |
Use |
ctx |
A SedonaDB context created using |
A SedonaDB DataFrame.
nc_gpkg <- system.file("gpkg/nc.gpkg", package = "sf") sd_read_sf(nc_gpkg)nc_gpkg <- system.file("gpkg/nc.gpkg", package = "sf") sd_read_sf(nc_gpkg)
Several types of user-defined functions can be registered into a session
context. Currently, the only implemented variety is an external pointer
to a Rust FFI_ScalarUDF, an example of which is available from the
DataFusion Python documentation.
sd_register_udf(udf) sd_ctx_register_udf(ctx, udf)sd_register_udf(udf) sd_ctx_register_udf(ctx, udf)
udf |
An object of class 'datafusion_scalar_udf' |
ctx |
A SedonaDB context. |
NULL, invisibly
Keep or drop columns of a SedonaDB DataFrame
sd_select(.data, ...)sd_select(.data, ...)
.data |
A sedonadb_dataframe or an object that can be coerced to one. |
... |
One or more bare names. Evaluated like |
An object of class sedonadb_dataframe
data.frame(x = 1:10, y = letters[1:10]) |> sd_select(x)data.frame(x = 1:10, y = letters[1:10]) |> sd_select(x)
The query will only be executed when requested.
sd_sql(sql, ..., params = NULL) sd_ctx_sql(ctx, sql, ..., params = NULL)sd_sql(sql, ..., params = NULL) sd_ctx_sql(ctx, sql, ..., params = NULL)
sql |
A SQL string to execute |
... |
These dots are for future extensions and currently must be empty. |
params |
A list of parameters to fill placeholders in the query. |
ctx |
A SedonaDB context. |
A sedonadb_dataframe
sd_sql("SELECT ST_Point(0, 1) as geom") sd_sql("SELECT ST_Point($1, $2) as geom", params = list(1, 2)) sd_sql("SELECT ST_Point($x, $y) as geom", params = list(x = 1, y = 2))sd_sql("SELECT ST_Point(0, 1) as geom") sd_sql("SELECT ST_Point($1, $2) as geom", params = list(1, 2)) sd_sql("SELECT ST_Point($x, $y) as geom", params = list(x = 1, y = 2))
Aggregate SedonaDB DataFrames to a single row per group
sd_summarise(.data, ...) sd_summarize(.data, ...)sd_summarise(.data, ...) sd_summarize(.data, ...)
.data |
A sedonadb_dataframe or an object that can be coerced to one. |
... |
Aggregate expressions. These are evaluated in the same way as
|
An object of class sedonadb_dataframe
data.frame(x = c(10:1, NA)) |> sd_summarise(x = sum(x, na.rm = TRUE))data.frame(x = c(10:1, NA)) |> sd_summarise(x = sum(x, na.rm = TRUE))
This is useful for creating a view that can be referenced in a SQL
statement. Use sd_drop_view() to remove it.
sd_to_view(.data, table_ref, overwrite = FALSE, ctx = NULL)sd_to_view(.data, table_ref, overwrite = FALSE, ctx = NULL)
.data |
A sedonadb_dataframe or an object that can be coerced to one. |
table_ref |
The name of the view reference |
overwrite |
Use TRUE to overwrite a view with the same name (if it exists) |
ctx |
A SedonaDB context. |
.data, invisibly
sd_sql("SELECT 1 as one") |> sd_to_view("foofy") sd_sql("SELECT * FROM foofy")sd_sql("SELECT 1 as one") |> sd_to_view("foofy") sd_sql("SELECT * FROM foofy")
Create, modify, and delete columns of a SedonaDB DataFrame
sd_transmute(.data, ...)sd_transmute(.data, ...)
.data |
A sedonadb_dataframe or an object that can be coerced to one. |
... |
Named expressions for new columns to create. These are evaluated
in the same way as |
An object of class sedonadb_dataframe
data.frame(x = 1:10) |> sd_transmute(y = x + 1L)data.frame(x = 1:10) |> sd_transmute(y = x + 1L)
This is a slightly more verbose form of sd_sql() with params that is
useful if a data frame is to be repeatedly queried.
sd_with_params(.data, ...)sd_with_params(.data, ...)
.data |
A sedonadb_dataframe or an object that can be coerced to one. |
... |
Named or unnamed parameters that will be coerced to literals
with |
A sedonadb_dataframe with the provided parameters filled into the query
sd_sql("SELECT ST_Point($1, $2) as pt") |> sd_with_params(11, 12) sd_sql("SELECT ST_Point($x, $y) as pt") |> sd_with_params(x = 11, y = 12)sd_sql("SELECT ST_Point($1, $2) as pt") |> sd_with_params(11, 12) sd_sql("SELECT ST_Point($x, $y) as pt") |> sd_with_params(x = 11, y = 12)
Write this DataFrame to one or more (Geo)Parquet files. For input that contains geometry columns, GeoParquet metadata is written such that suitable readers can recreate Geometry/Geography types when reading the output and potentially read fewer row groups when only a subset of the file is needed for a given query.
sd_write_parquet( .data, path, options = NULL, partition_by = character(0), sort_by = character(0), single_file_output = NULL, geoparquet_version = "1.0", overwrite_bbox_columns = FALSE, max_row_group_size = NULL, compression = NULL )sd_write_parquet( .data, path, options = NULL, partition_by = character(0), sort_by = character(0), single_file_output = NULL, geoparquet_version = "1.0", overwrite_bbox_columns = FALSE, max_row_group_size = NULL, compression = NULL )
.data |
A sedonadb_dataframe or an object that can be coerced to one. |
path |
A filename or directory to which parquet file(s) should be written |
options |
A named list of key/value options to be used when constructing
a parquet writer. Common options are exposed as other arguments to
|
partition_by |
A character vector of column names to partition by. If non-empty, applies hive-style partitioning to the output |
sort_by |
A character vector of column names to sort by. Currently only ascending sort is supported |
single_file_output |
Use TRUE or FALSE to force writing a single Parquet
file vs. writing one file per partition to a directory. By default,
a single file is written if |
geoparquet_version |
GeoParquet metadata version to write if output contains one or more geometry columns. The default ("1.0") is the most widely supported and will result in geometry columns being recognized in many readers; however, only includes statistics at the file level. Use "1.1" to compute an additional bounding box column for every geometry column in the output: some readers can use these columns to prune row groups when files contain an effective spatial ordering. The extra columns will appear just before their geometry column and will be named "[geom_col_name]_bbox" for all geometry columns except "geometry", whose bounding box column name is just "bbox" |
overwrite_bbox_columns |
Use TRUE to overwrite any bounding box columns that already exist in the input. This is useful in a read -> modify -> write scenario to ensure these columns are up-to-date. If FALSE (the default), an error will be raised if a bbox column already exists |
max_row_group_size |
Target maximum number of rows in each row group. Defaults to the global configuration value (1M rows). |
compression |
Sets the Parquet compression codec. Valid values are: uncompressed, snappy, gzip(level), brotli(level), lz4, zstd(level), and lz4_raw. Defaults to the global configuration value (zstd(3)). |
The input, invisibly
tmp_parquet <- tempfile(fileext = ".parquet") sd_sql("SELECT ST_Point(1, 2, 4326) as geom") |> sd_write_parquet(tmp_parquet) sd_read_parquet(tmp_parquet) unlink(tmp_parquet)tmp_parquet <- tempfile(fileext = ".parquet") sd_sql("SELECT ST_Point(1, 2, 4326) as geom") |> sd_write_parquet(tmp_parquet) sd_read_parquet(tmp_parquet) unlink(tmp_parquet)
SedonaDB ADBC Driver
sedonadb_adbc()sedonadb_adbc()
An adbcdrivermanager::adbc_driver() of class
'sedonadb_driver_sedonadb'
library(adbcdrivermanager) con <- sedonadb_adbc() |> adbc_database_init() |> adbc_connection_init() con |> read_adbc("SELECT ST_Point(0, 1) as geometry") |> as.data.frame()library(adbcdrivermanager) con <- sedonadb_adbc() |> adbc_database_init() |> adbc_connection_init() con |> read_adbc("SELECT ST_Point(0, 1) as geometry") |> as.data.frame()