Title: | 'DataSHIELD' 'Tidyverse' Clientside Package |
---|---|
Description: | Implementation of selected 'Tidyverse' functions within 'DataSHIELD', an open-source federated analysis solution in R. Currently, 'DataSHIELD' contains very limited tools for data manipulation, so the aim of this package is to improve the researcher experience by implementing essential functions for data manipulation, including subsetting, filtering, grouping, and renaming variables. This is the clientside package which should be installed locally, and is used in conjuncture with the serverside package 'dsTidyverse' which is installed on the remote server holding the data. For more information, see <https://www.tidyverse.org/>, <https://datashield.org/> and <https://github.com/molgenis/ds-tidyverse>. |
Authors: | Tim Cadman [aut, cre] , Mariska Slofstra [aut] , Stuart Wheater [aut], Demetris Avraam [aut] |
Maintainer: | Tim Cadman <[email protected]> |
License: | LGPL (>= 2.1) |
Version: | 1.0.0 |
Built: | 2024-11-11 16:18:32 UTC |
Source: | https://github.com/molgenis/ds-tidyverse-client |
DataSHIELD implentation of dplyr::arrange
.
ds.arrange( df.name = NULL, tidy_expr = NULL, .by_group = NULL, newobj = NULL, datasources = NULL )
ds.arrange( df.name = NULL, tidy_expr = NULL, .by_group = NULL, newobj = NULL, datasources = NULL )
df.name |
Character specifying a serverside data frame or tibble. |
tidy_expr |
A list containing variables, or functions of variables. Use |
.by_group |
If TRUE, will sort first by grouping variable. Applies to grouped data frames only. |
newobj |
Character specifying name for new server-side data frame. |
datasources |
DataSHIELD connections object. |
No return value, called for its side effects. An object (typically a data frame or tibble) with the name specified by newobj
is created on the server.
## Not run: ds.arrange( df.name = "mtcars", tidy_expr = list(drat), newobj = "sorted_df", datasources = conns ) ## End(Not run)
## Not run: ds.arrange( df.name = "mtcars", tidy_expr = list(drat), newobj = "sorted_df", datasources = conns ) ## End(Not run)
DataSHIELD implementation of tibble::as_tibble
. Currently only implemented
for data frames and tibbles.
ds.as_tibble( x = NULL, .rows = NULL, .name_repair = "check_unique", rownames = NULL, newobj = NULL, datasources = NULL )
ds.as_tibble( x = NULL, .rows = NULL, .name_repair = "check_unique", rownames = NULL, newobj = NULL, datasources = NULL )
x |
A data frame or matrix. |
.rows |
The number of rows, useful to create a 0-column tibble or just as an additional check. |
.name_repair |
Treatment of problematic column names:
|
rownames |
How to treat existing row names of a data frame or matrix:
|
newobj |
Character specifying name for new server-side data frame. |
datasources |
DataSHIELD connections object. |
No return value, called for its side effects. A tibble with the name specified by newobj
is created on the server.
## Not run: ds.as_tibble( x = "mtcars", newobj = "mtcars_tib", datasources = conns ) ## End(Not run)
## Not run: ds.as_tibble( x = "mtcars", newobj = "mtcars_tib", datasources = conns ) ## End(Not run)
DataSHIELD implementation of dplyr::bind_cols
.
ds.bind_cols( to_combine = NULL, .name_repair = c("unique", "universal", "check_unique", "minimal"), newobj = NULL, datasources = NULL )
ds.bind_cols( to_combine = NULL, .name_repair = c("unique", "universal", "check_unique", "minimal"), newobj = NULL, datasources = NULL )
to_combine |
Data frames to combine. Each argument can either be a data frame, a list that could be a data frame, or a list of data frames. Columns are matched by name, and any missing columns will be filled with NA. |
.name_repair |
One of "unique", "universal", or "check_unique". See
|
newobj |
Character specifying name for new server-side data frame. |
datasources |
datashield connections object. |
No return value, called for its side effects. A data frame with the name specified by
newobj
and the same type as the first element of to_combine
is created on the
server.
## Not run: ## First log in to a DataSHIELD session with mtcars dataset loaded. ds.bind_cols( to_combine = list(mtcars, mtcars), .name_repair = "universal", newobj = "test", datasources = conns ) ## Refer to the package vignette for more examples. ## End(Not run)
## Not run: ## First log in to a DataSHIELD session with mtcars dataset loaded. ds.bind_cols( to_combine = list(mtcars, mtcars), .name_repair = "universal", newobj = "test", datasources = conns ) ## Refer to the package vignette for more examples. ## End(Not run)
DataSHIELD implementation of dplyr::bind_rows
.
ds.bind_rows(to_combine = NULL, .id = NULL, newobj = NULL, datasources = NULL)
ds.bind_rows(to_combine = NULL, .id = NULL, newobj = NULL, datasources = NULL)
to_combine |
Data frames to combine. Each argument can either be a data frame, a list that could be a data frame, or a list of data frames. Columns are matched by name, and any missing columns will be filled with NA. |
.id |
The name of an optional identifier column. Provide a string to create an output column that identifies each input. The column will use names if available, otherwise it will use positions. |
newobj |
Character specifying name for new server-side data frame. |
datasources |
datashield connections object. |
No return value, called for its side effects. A data frame with the name specified by
newobj
and the same type as the first element of to_combine
is created on the
server.
## Not run: ## First log in to a DataSHIELD session with mtcars dataset loaded. ds.bind_rows( to_combine = list(mtcars, mtcars), newobj = "test", datasources = conns ) ## Refer to the package vignette for more examples. ## End(Not run)
## Not run: ## First log in to a DataSHIELD session with mtcars dataset loaded. ds.bind_rows( to_combine = list(mtcars, mtcars), newobj = "test", datasources = conns ) ## Refer to the package vignette for more examples. ## End(Not run)
DataSHIELD implementation of dplyr::case_when
.
ds.case_when( tidy_expr = NULL, .default = NULL, .ptype = NULL, .size = NULL, newobj = NULL, datasources = NULL )
ds.case_when( tidy_expr = NULL, .default = NULL, .ptype = NULL, .size = NULL, newobj = NULL, datasources = NULL )
tidy_expr |
A list containing a sequence of two-sided formulas:
All inputs will be recycled to their common size. We encourage all LHS inputs to be the same size. Recycling is mainly useful for RHS inputs, where you might supply a size 1 input that will be recycled to the size of the LHS inputs. NULL inputs are ignored. |
.default |
The value used when all of the LHS inputs return either FALSE or NA. |
.ptype |
An optional prototype declaring the desired output type. If supplied, this overrides the common type of true, false, and missing. |
.size |
An optional size declaring the desired output size. If supplied, this overrides the size of condition. |
newobj |
Character specifying name for new server-side data frame. |
datasources |
datashield connections object. |
No return value, called for its side effects. A vector with the same size as the common
size computed from the inputs in tidy_expr
and the same type as the common type of the
RHS inputs in tidy_expr
is created on the server.
## Not run: ## First log in to a DataSHIELD session with mtcars dataset loaded. ds.case_when( tidy_expr = list( mtcars$mpg < 10 ~ "low", mtcars$mpg >= 10 & mtcars$mpg < 20 ~ "medium", mtcars$mpg >= 20 ~ "high" ), newobj = "test", datasources = conns ) ## Refer to the package vignette for more examples. ## End(Not run)
## Not run: ## First log in to a DataSHIELD session with mtcars dataset loaded. ds.case_when( tidy_expr = list( mtcars$mpg < 10 ~ "low", mtcars$mpg >= 10 & mtcars$mpg < 20 ~ "medium", mtcars$mpg >= 20 ~ "high" ), newobj = "test", datasources = conns ) ## Refer to the package vignette for more examples. ## End(Not run)
DataSHIELD implentation of dplyr::distinct
.
ds.distinct( df.name = NULL, tidy_expr = NULL, .keep_all = FALSE, newobj = NULL, datasources = NULL )
ds.distinct( df.name = NULL, tidy_expr = NULL, .keep_all = FALSE, newobj = NULL, datasources = NULL )
df.name |
Character specifying a serverside data frame or tibble. |
tidy_expr |
Optionally, list of variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables in the data frame. |
.keep_all |
If TRUE, keep all variables in .data. If a combination of |
newobj |
Character specifying name for new server-side data frame. |
datasources |
DataSHIELD connections object. |
No return value, called for its side effects. An object (typically a data frame or tibble)
with the name specified by newobj
is created on the server.
## Not run: ds.distinct( df.name = "mtcars", expr = list(mpg, cyl), newobj = "distinct_df" ) ## End(Not run)
## Not run: ds.distinct( df.name = "mtcars", expr = list(mpg, cyl), newobj = "distinct_df" ) ## End(Not run)
DataSHIELD implentation of dplyr::filter
.
ds.filter( df.name = NULL, tidy_expr = NULL, .by = NULL, .preserve = FALSE, newobj = NULL, datasources = NULL )
ds.filter( df.name = NULL, tidy_expr = NULL, .by = NULL, .preserve = FALSE, newobj = NULL, datasources = NULL )
df.name |
Character specifying a serverside data frame or tibble. |
tidy_expr |
List of expressions that return a logical value, and are defined in terms of the
variables in |
.by |
Optionally, a selection of columns to group by for just this operation, functioning as an alternative to |
.preserve |
Relevant when the .data input is grouped. If .preserve = FALSE (the default), the grouping structure is recalculated based on the resulting data, otherwise the grouping is kept as is. |
newobj |
Character specifying name for new server-side data frame. |
datasources |
DataSHIELD connections object. |
No return value, called for its side effects. An object (typically a data frame or tibble)
with the name specified by newobj
is created on the server.
## Not run: ds.filter( df.name = "mtcars", tidy_expr = list(cyl == 4 & mpg > 20), newobj = "filtered", datasources = conns ) ## End(Not run)
## Not run: ds.filter( df.name = "mtcars", tidy_expr = list(cyl == 4 & mpg > 20), newobj = "filtered", datasources = conns ) ## End(Not run)
DataSHIELD implentation of dplyr::group_by
.
ds.group_by( df.name = NULL, tidy_expr, .add = FALSE, .drop = TRUE, newobj = NULL, datasources = NULL )
ds.group_by( df.name = NULL, tidy_expr, .add = FALSE, .drop = TRUE, newobj = NULL, datasources = NULL )
df.name |
Character specifying a serverside data frame or tibble. |
tidy_expr |
List of variables or computations to group by. |
.add |
When FALSE, the default, |
.drop |
Drop groups formed by factor levels that don't appear in the data? The default is TRUE except when .data has been previously grouped with .drop = FALSE. |
newobj |
Character specifying name for new server-side data frame. |
datasources |
DataSHIELD connections object. |
No return value, called for its side effects. A grouped data frame with class grouped_df
newobj
is created on the server, unless the combination of tidy_expr
and .add
yields a empty set of grouping columns, in which case a tibble will be created on the server.
## Not run: ds.group_by( df.name = "mtcars", expr = list(mpg, cyl), newobj = "grouped_df" ) ## End(Not run)
## Not run: ds.group_by( df.name = "mtcars", expr = list(mpg, cyl), newobj = "grouped_df" ) ## End(Not run)
DataSHIELD implentation of dplyr::group_keys
.
ds.group_keys(df.name = NULL, datasources = NULL)
ds.group_keys(df.name = NULL, datasources = NULL)
df.name |
Character specifying a serverside tibble. |
datasources |
DataSHIELD connections object. |
A data frame describing the groups.
## Not run: my_groups <- ds.group_keys("grouped_df") ## End(Not run)
## Not run: my_groups <- ds.group_keys("grouped_df") ## End(Not run)
DataSHIELD implementation of dplyr::if_else
.
ds.if_else( condition = NULL, true = NULL, false = NULL, missing = NULL, ptype = NULL, size = NULL, newobj = NULL, datasources = NULL )
ds.if_else( condition = NULL, true = NULL, false = NULL, missing = NULL, ptype = NULL, size = NULL, newobj = NULL, datasources = NULL )
condition |
A list specifying a logical vector in tidyverse syntax, ie data and column names unquoted. |
true |
Vector to use for TRUE value of condition. |
false |
Vector to use for FALSE value of condition. |
missing |
If not NULL, will be used as the value for NA values of condition. Follows the same size and type rules as true and false. |
ptype |
An optional prototype declaring the desired output type. If supplied, this overrides the common type of true, false, and missing. |
size |
An optional size declaring the desired output size. If supplied, this overrides the size of condition. |
newobj |
Character specifying name for new server-side data frame. |
datasources |
datashield connections object. |
No return value, called for its side effects. A vector with the same size as
condition
and the same type as the common type of true
, false
, and
missing
is created on the server.
## Not run: ## First log in to a DataSHIELD session with mtcars dataset loaded. ## Refer to the package vignette for more examples. ## End(Not run)
## Not run: ## First log in to a DataSHIELD session with mtcars dataset loaded. ## Refer to the package vignette for more examples. ## End(Not run)
DataSHIELD implementation of dplyr::mutate
.
ds.mutate( df.name = NULL, tidy_expr = NULL, newobj = NULL, .keep = "all", .before = NULL, .after = NULL, datasources = NULL )
ds.mutate( df.name = NULL, tidy_expr = NULL, newobj = NULL, .keep = "all", .before = NULL, .after = NULL, datasources = NULL )
df.name |
Character specifying a serverside data frame or tibble. |
tidy_expr |
List of tidyselect syntax to be passed to dplyr::mutate. |
newobj |
Character specifying name for new server-side data frame. |
.keep |
Control which columns from
Grouping columns and columns created by |
.before |
<tidy-select> Optionally, control where new columns should appear (the default is
to add to the right hand side). See |
.after |
<tidy-select> Optionally, control where new columns should appear (the default is
to add to the right hand side). See |
datasources |
datashield connections object. |
No return value, called for its side effects. An object (typically a data frame or tibble)
with the name specified by newobj
is created on the server.
## Not run: ## First log in to a DataSHIELD session with mtcars dataset loaded. ds.mutate( df.name = "mtcars", tidy_select = list(mpg_trans = cyl * 1000, new_var = (hp - drat) / qsec), newobj = "df_with_new_cols" ) ## Refer to the package vignette for more examples. ## End(Not run)
## Not run: ## First log in to a DataSHIELD session with mtcars dataset loaded. ds.mutate( df.name = "mtcars", tidy_select = list(mpg_trans = cyl * 1000, new_var = (hp - drat) / qsec), newobj = "df_with_new_cols" ) ## Refer to the package vignette for more examples. ## End(Not run)
DataSHIELD implentation of dplyr::rename
.
ds.rename(df.name = NULL, tidy_expr = NULL, newobj = NULL, datasources = NULL)
ds.rename(df.name = NULL, tidy_expr = NULL, newobj = NULL, datasources = NULL)
df.name |
Character specifying a serverside data frame or tibble. |
tidy_expr |
List with format new_name = old_name to rename selected variables. |
newobj |
Character specifying name for new server-side data frame. |
datasources |
DataSHIELD connections object. |
No return value, called for its side effects. An object (typically a data frame or tibble)
with the name specified by newobj
is created on the server.
## Not run: ## First log in to a DataSHIELD session with mtcars dataset loaded. ds.rename( df.name = "mtcars", tidy_select = list(new_var_1 = mpg, new_var_2 = cyl), newobj = "df_renamed", dataources = conns ) ## Refer to the package vignette for more examples. ## End(Not run)
## Not run: ## First log in to a DataSHIELD session with mtcars dataset loaded. ds.rename( df.name = "mtcars", tidy_select = list(new_var_1 = mpg, new_var_2 = cyl), newobj = "df_renamed", dataources = conns ) ## Refer to the package vignette for more examples. ## End(Not run)
DataSHIELD implentation of dplyr::select
.
ds.select(df.name = NULL, tidy_expr = NULL, newobj = NULL, datasources = NULL)
ds.select(df.name = NULL, tidy_expr = NULL, newobj = NULL, datasources = NULL)
df.name |
Character specifying a serverside data frame or tibble. |
tidy_expr |
List of one or more unquoted expressions separated by commas. Variable names can be used as if they were positions in the data frame, so expressions like x:y can be used to select a range of variables. |
newobj |
Character specifying name for new server-side data frame. |
datasources |
DataSHIELD connections object. |
No return value, called for its side effects. An object (typically a data frame or tibble)
with the name specified by newobj
is created on the server.
## Not run: ds.select( df.name = "mtcars", tidy_expr = list(mpg, starts_with("t")), newobj = "df_subset", dataources = conns ) ## End(Not run)
## Not run: ds.select( df.name = "mtcars", tidy_expr = list(mpg, starts_with("t")), newobj = "df_subset", dataources = conns ) ## End(Not run)
DataSHIELD implentation of dplyr::slice
.
ds.slice( df.name = NULL, tidy_expr = NULL, .by = NULL, .preserve = FALSE, newobj = NULL, datasources = NULL )
ds.slice( df.name = NULL, tidy_expr = NULL, .by = NULL, .preserve = FALSE, newobj = NULL, datasources = NULL )
df.name |
Character specifying a serverside data frame or tibble. |
tidy_expr |
List, provide either positive values to keep, or negative values to drop. The values provided must be either all positive or all negative. Indices beyond the number of rows in the input are silently ignored. |
.by |
Optionally, a selection of columns to group by for just this operation, functioning as
an alternative to |
.preserve |
Relevant when the .data input is grouped. If .preserve = FALSE (the default), the grouping structure is recalculated based on the resulting data, otherwise the grouping is kept as is. |
newobj |
Character specifying name for new server-side data frame. |
datasources |
DataSHIELD connections object. |
No return value, called for its side effects. An object (typically a data frame or tibble)
with the name specified by newobj
is created on the server.
## Not run: ds.slice( df.name = "mtcars", expr = list(1:10), .by = "cyl", newobj = "sliced_df" ) ## End(Not run)
## Not run: ds.slice( df.name = "mtcars", expr = list(1:10), .by = "cyl", newobj = "sliced_df" ) ## End(Not run)
DataSHIELD implentation of dplyr::ungroup
.
ds.ungroup(x = NULL, newobj = NULL, datasources = NULL)
ds.ungroup(x = NULL, newobj = NULL, datasources = NULL)
x |
a tibble or data frame. |
newobj |
Character specifying name for new server-side data frame. |
datasources |
DataSHIELD connections object. |
No return value, called for its side effects. An ungrouped data frame or tibble is created on the server.
## Not run: ds.ungroup("grouped_df") ## End(Not run)
## Not run: ds.ungroup("grouped_df") ## End(Not run)