| Title: | Open-Access Computational Biology Datasets |
|---|---|
| Description: | Efficiently access the 'Bedrock Bio' library of open-access computational biology datasets. Lazily query datasets backed by 'DuckDB' and 'Apache Iceberg', with support for predicate pushdown and column projection to the cloud storage backend. This enables quick, iterative access to otherwise massive, unwieldy datasets without downloading them in full. See <https://bedrock.bio> for available datasets and documentation. |
| Authors: | Liam Abbott [aut, cre, cph] |
| Maintainer: | Liam Abbott <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.2.1 |
| Built: | 2026-03-26 20:42:38 UTC |
| Source: | https://github.com/bedrock-bio/bedrock-bio-client |
Describe a dataset's metadata, citation, and columns
describe_dataset(name)describe_dataset(name)
name |
Dataset identifier (e.g., "ukb_ppp.pqtls") |
A named list with name, description, citation, source_url, license, and columns.
library(bedrockbio) info <- describe_dataset("ukb_ppp.pqtls") info$namelibrary(bedrockbio) info <- describe_dataset("ukb_ppp.pqtls") info$name
List available datasets in the Bedrock Bio library
list_datasets()list_datasets()
A character vector of dataset identifiers
library(bedrockbio) list_datasets()library(bedrockbio) list_datasets()
Lazily query a dataset
load_dataset(name, ...)load_dataset(name, ...)
name |
Dataset identifier (e.g., "ukb_ppp.pqtls") |
... |
Required partition filters (e.g., ancestry = "EUR", protein_id = "A0FGR8") |
A lazy tbl backed by DuckDB, compatible with dplyr verbs.
library(bedrockbio) library(dplyr) df <- load_dataset( "dbsnp.vcf", build = "b157", assembly = "GRCh38", chromosome = "22" ) |> select(rsid, position, ref_allele, alt_allele) |> head(5) |> collect()library(bedrockbio) library(dplyr) df <- load_dataset( "dbsnp.vcf", build = "b157", assembly = "GRCh38", chromosome = "22" ) |> select(rsid, position, ref_allele, alt_allele) |> head(5) |> collect()