Skip to contents

These functions allow definition of custom data aggregators for processing data extracted from raw files. An aggregator is run on each imported file and pulls together the relevant data users are interested in while making sure data formats are correct so that the aggregated data can be merged across several imported files for fast downstream processing.

Usage

orbi_start_aggregator(name)

orbi_add_to_aggregator(
  aggregator,
  dataset,
  column,
  source = column,
  default = NA,
  cast = "as.character",
  regexp = FALSE,
  func = NULL,
  args = NULL
)

orbi_register_aggregator(aggregator, name = attr(aggregator, "name"))

orbi_get_aggregator(name)

Arguments

name

a descriptive name for the aggregator. This name is automatically used as the default name when registering the aggregator via orbi_register_aggregator().

aggregator

the aggregator table generated by orbi_start_aggregator() or passed from a previous call to orbi_add_to_aggregator() for constructing the entire aggregator by piping

dataset

the name of the dataset to aggregate from (file_info, scans, peaks, spectra)

column

the name of the column in which data should be stored

source

single character column name or vector of column names (if alternatives could be the source) where in the dataset to find data for the column. If a vector of multiple column names is provided (e.g. source = c("a1", "a2")), the first column name that's found during processing of a dataset will be used and passed to the function defined in func (if any) and then the one defined in cast. To provide multiple parameters from the data to func, define a list instead of a vector source = list("a", "b", "c") or if multiple alternative columns can be the source for any of the arguments, define as source = list(c("a1", "a2"), "b", c("c1", "c2", "c3"))

default

the default value if no source columns can be found or another error is encountered during aggregatio. Note that the default value will also be processed with the function in cast to make sure it has the correct data type.

cast

what to cast the values of the resulting column to, most commonly "as.character", "as.integer", "as.numeric", or "as.factor". This is required to ensure all aggregated values have the correct data type.

regexp

whether source columm names should be interpreted as a regular expressions for the purpose of finding the relevant column(s). Note if regexp = TRUE, the search for the source column always becomes case-insensitive so this can also be used for a direct match of a source column whose upper/lower casing can be unreliable. If a column is matched by a regexp and also by a direct aggregator rule, the direct aggregator rule takes precedence.

func

name of a processing function to apply before casting the value with the cast function. This is optional and can be used to conduct more elaborate preprocessing of a data or combining data from multiple source columns in the correct way (e.g. pasting together from multiple columns).

args

an optional list of arguments to pass to the func in addition to the values coming from the source colummn(s)

Value

an orbi aggregator tibble

Functions

  • orbi_start_aggregator(): starts the aggregator

  • orbi_add_to_aggregator(): add additional column to aggregate data for. Overwrites an existing aggregator entry for the same dataset and column if it already exists.

  • orbi_register_aggregator(): register an aggregator in the isoorbi options so it can be retrieved with orbi_get_aggregator()

  • orbi_get_aggregator(): retrieve a registered aggregator (get all aggregators with orbi_get_option("aggregators"))