Package 'bedrockbio'

Title: Open-Access Computational Biology Datasets
Description: Efficiently access the 'Bedrock Bio' library of open-access computational biology datasets. Lazily query datasets backed by 'DuckDB' and 'Apache Iceberg', with support for predicate pushdown and column projection to the cloud storage backend. This enables quick, iterative access to otherwise massive, unwieldy datasets without downloading them in full. See <https://bedrock.bio> for available datasets and documentation.
Authors: Liam Abbott [aut, cre, cph]
Maintainer: Liam Abbott <[email protected]>
License: GPL (>= 3)
Version: 1.2.1
Built: 2026-03-26 20:42:38 UTC
Source: https://github.com/bedrock-bio/bedrock-bio-client

Help Index


Describe a dataset's metadata, citation, and columns

Description

Describe a dataset's metadata, citation, and columns

Usage

describe_dataset(name)

Arguments

name

Dataset identifier (e.g., "ukb_ppp.pqtls")

Value

A named list with name, description, citation, source_url, license, and columns.

Examples

library(bedrockbio)
info <- describe_dataset("ukb_ppp.pqtls")
info$name

List available datasets in the Bedrock Bio library

Description

List available datasets in the Bedrock Bio library

Usage

list_datasets()

Value

A character vector of dataset identifiers

Examples

library(bedrockbio)
list_datasets()

Lazily query a dataset

Description

Lazily query a dataset

Usage

load_dataset(name, ...)

Arguments

name

Dataset identifier (e.g., "ukb_ppp.pqtls")

...

Required partition filters (e.g., ancestry = "EUR", protein_id = "A0FGR8")

Value

A lazy tbl backed by DuckDB, compatible with dplyr verbs.

Examples

library(bedrockbio)
library(dplyr)

df <- load_dataset(
  "dbsnp.vcf",
  build = "b157",
  assembly = "GRCh38",
  chromosome = "22"
) |>
  select(rsid, position, ref_allele, alt_allele) |>
  head(5) |>
  collect()