what happens if you drop a numerical varible in the summary table to replace
Introduction
The tbl_summary() function calculates descriptive statistics for continuous, categorical, and dichotomous variables in R, and presents the results in a cute, customizable summary table ready for publication (for example, Table i or demographic tables).
This vignette will walk a reader through the tbl_summary() office, and the various functions available to modify and make additions to an existing table summary object.
Setup
Before going through the tutorial, install and load {gtsummary}.
Example information set
Nosotros'll be using the trial data fix throughout this instance.
-
This set contains data from 200 patients who received ane of 2 types of chemotherapy (Drug A or Drug B). The outcomes are tumor response and decease.
-
Each variable in the data frame has been assigned an attribute label (i.due east.
attr(trial$trt, "label") == "Chemotherapy Treatment")with the labelled package. These labels are displayed in the {gtsummary} output table by default. Using {gtsummary} on a data frame without labels will only impress variable names in identify of variable labels; there is as well an choice to add labels later.
| Variable | Class | Label |
|---|---|---|
| | grapheme | Chemotherapy Handling |
| | numeric | Age |
| | numeric | Marker Level (ng/mL) |
| | factor | T Stage |
| | gene | Grade |
| | integer | Tumor Response |
| | integer | Patient Died |
| | numeric | Months to Death/Censor |
| Includes mix of continuous, dichotomous, and categorical variables | ||
head ( trial ) #> # A tibble: 6 × 8 #> trt age mark phase course response death ttdeath #> <chr> <dbl> <dbl> <fct> <fct> <int> <int> <dbl> #> one Drug A 23 0.sixteen T1 2 0 0 24 #> 2 Drug B 9 one.11 T2 I one 0 24 #> 3 Drug A 31 0.277 T1 Two 0 0 24 #> 4 Drug A NA 2.07 T3 Three 1 one 17.6 #> 5 Drug A 51 2.77 T4 III 1 1 16.four #> 6 Drug B 39 0.613 T4 I 0 1 15.6 For brevity, in this tutorial we'll apply a subset of the variables from the trial data set.
Basic Usage
The default output from tbl_summary() is meant to be publication ready.
Let's outset by creating a table of summary statistics from the trial data ready. The tbl_summary() function tin take, at minimum, a data frame as the only input, and returns descriptive statistics for each column in the data frame.
| Characteristic | Northward = 200 ane |
|---|---|
| Chemotherapy Treatment | |
| Drug A | 98 (49%) |
| Drug B | 102 (51%) |
| Historic period | 47 (38, 57) |
| Unknown | eleven |
| Grade | |
| I | 68 (34%) |
| II | 68 (34%) |
| III | 64 (32%) |
| i n (%); Median (IQR) | |
Note the sensible defaults with this bones usage; each of the defaults may be customized.
-
Variable types are automatically detected so that appropriate descriptive statistics are calculated.
-
Characterization attributes from the data set are automatically printed.
-
Missing values are listed as "Unknown" in the table.
-
Variable levels are indented and footnotes are added.
For this study data the summary statistics should be dissever by treatment group, which can be done past using the past= argument. To compare two or more groups, include add_p() with the function phone call, which detects variable type and uses an appropriate statistical examination.
| Feature | Drug A, N = 98one | Drug B, N = 1021 | p-value 2 |
|---|---|---|---|
| Historic period | 46 (37, 59) | 48 (39, 56) | 0.7 |
| Unknown | 7 | 4 | |
| Grade | 0.9 | ||
| I | 35 (36%) | 33 (32%) | |
| II | 32 (33%) | 36 (35%) | |
| III | 31 (32%) | 33 (32%) | |
| i Median (IQR); n (%) 2 Wilcoxon rank sum test; Pearson's Chi-squared test | |||
Customize Output
There are four primary means to customize the output of the summary table.
- Apply
tbl_summary()function arguments - Add additional data/information to a summary table with
add_*()functions - Modify summary table appearance with the {gtsummary} functions
- Modify tabular array appearance with {gt} package functions
Modifying tbl_summary() function arguments
The tbl_summary() function includes many input options for modifying the advent.
| Argument | Description |
|---|---|
| | specify the variable labels printed in table |
| | specify the variable type (east.thousand. continuous, chiselled, etc.) |
| | change the summary statistics presented |
| | number of digits the summary statistics will be rounded to |
| | whether to brandish a row with the number of missing observations |
| | text label for the missing number row |
| | change the sorting of categorical levels past frequency |
| | print column, row, or cell percentages |
| | list of variables to include in summary table |
Example modifying tbl_summary() arguments.
| Characteristic | Drug A, Due north = 981 | Drug B, N = 102one |
|---|---|---|
| Historic period | 47.01 (xiv.71) | 47.45 (14.01) |
| (Missing) | 7 | 4 |
| Tumor Grade | ||
| I | 35 / 98 (36%) | 33 / 102 (32%) |
| II | 32 / 98 (33%) | 36 / 102 (35%) |
| Three | 31 / 98 (32%) | 33 / 102 (32%) |
| 1 Hateful (SD); n / North (%) | ||
There are multiple ways to specify the statistic= argument using a single formula, a listing of formulas, and a named list. The following table shows equivalent ways to specify the mean statistic for continuous variables age and marking. Whatsoever {gtsummary} office argument that accepts formulas will accept each of these variations.
| Select with Helpers | Select past Variable Name | Select with Named List |
|---|---|---|
| | | |
| | | — |
| — | | — |
{gtsummary} functions to add information
The {gtsummary} package has functions to adding information or statistics to tbl_summary() tables.
| Function | Description |
|---|---|
| | add together p-values to the output comparing values across groups |
| | add a column with overall summary statistics |
| | add a column with N (or N missing) for each variable |
| | add column for departure between two group, confidence interval, and p-value |
| | add together label for the summary statistics shown in each row |
| | generic function to add together a column with user-defined values |
| | add together a cavalcade of q values to control for multiple comparisons |
{gtsummary} functions to format table
The {gtsummary} package comes with functions specifically made to alter and format summary tables.
Example adding tbl_summary()-family functions
| Variable | Northward | Overall, N = 2001 | Handling Received | p-value ii | |
|---|---|---|---|---|---|
| Drug A, N = 981 | Drug B, North = 1021 | ||||
| Historic period | 189 | 47 (38, 57) | 46 (37, 59) | 48 (39, 56) | 0.72 |
| Unknown | 11 | vii | 4 | ||
| Grade | 200 | 0.87 | |||
| I | 68 (34%) | 35 (36%) | 33 (32%) | ||
| 2 | 68 (34%) | 32 (33%) | 36 (35%) | ||
| Three | 64 (32%) | 31 (32%) | 33 (32%) | ||
| 1 Median (IQR) or Frequency (%) 2 Wilcoxon rank sum test; Pearson's Chi-squared test | |||||
{gt} functions to format tabular array
The {gt} packet is packed with many great functions for modifying tabular array output—too many to list here. Review the package's website for a full listing.
To use the {gt} packet functions with {gtsummary} tables, the summary table must first be converted into a gt object. To this finish, employ the as_gt() role after modifications have been completed with {gtsummary} functions.
| Characteristic | Northward | Drug A, N = 981 | Drug B, North = 1021 |
|---|---|---|---|
| Age | 189 | 46 (37, 59) | 48 (39, 56) |
| Form | 200 | ||
| I | 35 (36%) | 33 (32%) | |
| 2 | 32 (33%) | 36 (35%) | |
| III | 31 (32%) | 33 (32%) | |
| This information is simulated | |||
| 1 Median (IQR); north (%) | |||
Select Helpers
There is flexibility in how you select variables for {gtsummary} arguments, which allows for many customization opportunities! For instance, if y'all want to prove age and the marking levels to 1 decimal identify in tbl_summary(), you can pass digits = c(age, marking) ~ i. The selecting input is flexible, and you may also pass quoted column names.
Going beyond typing out specific variables in your data set, yous tin use:
-
All {tidyselect} helpers available throughout the tidyverse, such as
starts_with(),contains(), andeverything()(i.east. annihilation you can employ with thedplyr::select()part), tin can exist used with {gtsummary}. -
Additional {gtsummary} selectors that are included in the package to supplement tidyselect functions.
-
Summary blazon In that location are two primary ways to select variables by their summary type. This is useful, for example, when you lot wish to report the mean and standard deviation for all continuous variables:
statistic = all_continuous() ~ "{hateful} ({sd})".Dichotomous variables are, by default, included with
all_categorical().
-
Multi-line Continuous Summaries
Continuous variables may also exist summarized on multiple lines—a mutual format in some journals. To update the continuous variables to summarize on multiple lines, update the summary type to "continuous2" (for summaries on 2 or more lines).
| Characteristic | Drug A, N = 98 | Drug B, N = 102 | p-value i |
|---|---|---|---|
| Age | 0.72 | ||
| Due north | 91 | 98 | |
| Median (IQR) | 46 (37, 59) | 48 (39, 56) | |
| Range | vi, 78 | nine, 83 | |
| 1 Wilcoxon rank sum test | |||
Avant-garde Customization
The information in this department applies to all {gtsummary} objects.
The {gtsummary} tabular array has ii important internal objects:
| Internal Object | Clarification |
|---|---|
| | data frame that is printed as the gtsummary output table |
| | contains instructions for styling |
When you print output from the tbl_summary() role into the R panel or into an R markdown document, the .$table_body data frame is formatted using the instructions listed in .$table_styling. The default printer converts the {gtsummary} object to a {gt} object with as_gt() via a sequence of {gt} commands executed on .$table_body. Here's an instance of the beginning few calls saved with tbl_summary():
tbl_summary ( trial2 ) %>% as_gt (return_calls = Truthful ) %>% caput (n = 4 ) #> $gt #> gt::gt(information = x$table_body, groupname_col = Goose egg, explanation = Cypher) #> #> $fmt_missing #> $fmt_missing[[i]] #> gt::fmt_missing(columns = gt::everything(), missing_text = "") #> #> #> $cols_align #> $cols_align[[ane]] #> gt::cols_align(columns = c("variable", "var_type", "var_label", #> "row_type", "stat_0"), align = "center") #> #> $cols_align[[2]] #> gt::cols_align(columns = "characterization", marshal = "left") #> #> #> $tab_style_indent #> $tab_style_indent[[1]] #> gt::tab_style(style = gt::cell_text(indent = gt::px(x), marshal = "left"), #> locations = gt::cells_body(columns = "label", rows = c(2L, #> 3L, 5L, 7L, 8L, 9L))) The {gt} functions are called in the order they appear, beginning with gt::gt().
If the user does not want a specific {gt} part to run (i.e. would like to modify default printing), whatever {gt} call can be excluded in the as_gt() part. In the case below, the default alignment is restored.
After the as_gt() function is run, additional formatting may be added to the table using {gt} functions. In the example below, a source notation is added to the table.
| Feature | Drug A, N = 981 | Drug B, North = 102one |
|---|---|---|
| Age | 46 (37, 59) | 48 (39, 56) |
| Unknown | 7 | 4 |
| Grade | ||
| I | 35 (36%) | 33 (32%) |
| Two | 32 (33%) | 36 (35%) |
| 3 | 31 (32%) | 33 (32%) |
| This data is fake | ||
| 1 Median (IQR); n (%) | ||
Set Default Options with Themes
The {gtsummary} tbl_summary() part and the related functions have sensible defaults for rounding and presenting results. If you, still, would like to change the defaults there are a few options. The default options tin can be inverse using the {gtsummary} themes part set_gtsummary_theme(). The parcel includes prespecified themes, and you can also create your own. Themes can command baseline behavior, for example, how p-values and percentages are rounded, which statistics are presented in tbl_summary(), default statistical tests in add_p(), etc.
For details on creating a theme and setting personal defaults, review the themes vignette.
Survey Data
The {gtsummary} package likewise supports survey data (objects created with the {survey} package) via the tbl_svysummary() function. The syntax for tbl_svysummary() and tbl_summary() are nearly identical, and the examples to a higher place apply to survey summaries too.
To begin, install the {survey} bundle and load the apiclus1 information set.
# loading the api data set data ( api, packet = "survey" ) Before nosotros begin, nosotros convert the data frame to a survey object, registering the ID and weighting columns, and setting the finite population correction column.
svy_apiclus1 <- survey :: svydesign ( id = ~ dnum, weights = ~ pw, information = apiclus1, fpc = ~ fpc ) After creating the survey object, nosotros can now summarize it similarly to a standard data frame using tbl_svysummary(). Like tbl_summary(), tbl_svysummary() accepts the by= argument and works with the add_p() and add_overall() functions.
It is not possible to pass custom functions to the statistic= argument of tbl_svysummary(). You must use one of the pre-defined summary statistic functions (e.g.{mean}, {median}) which leverage functions from the {survey} parcel to calculate weighted statistics.
svy_apiclus1 %>% tbl_svysummary ( # stratify summary statistics by the "both" column past = both, # summarize a subset of the columns include = c ( api00, api99, both ), # calculation labels to table label = list ( api00 ~ "API in 2000", api99 ~ "API in 1999" ) ) %>% add_p ( ) %>% # comparing values by "both" column add_overall ( ) %>% # adding spanning header modify_spanning_header ( c ( "stat_1", "stat_2" ) ~ "**Met Both Targets**" ) | Characteristic | Overall, North = 6,1941 | Met Both Targets | p-value 2 | |
|---|---|---|---|---|
| No, N = 1,6921 | Yeah, N = four,5021 | |||
| API in 2000 | 652 (552, 718) | 631 (556, 710) | 654 (551, 722) | 0.4 |
| API in 1999 | 615 (512, 691) | 632 (548, 698) | 611 (497, 686) | 0.2 |
| 1 Median (IQR) 2 Wilcoxon rank-sum test for complex survey samples | ||||
tbl_svysummary() can as well handle weighted survey data where each row represents several individuals:
| Feature | N = 2,201 ane |
|---|---|
| Age | |
| Adult | 2,092 (95%) |
| Child | 109 (five.0%) |
| Survived | 711 (32%) |
| ane n (%) | |
Cross Tables
Utilise tbl_cross() to compare two categorical variables in your data. tbl_cross() is a wrapper for tbl_summary() that:
- Automatically adds a spanning header to your tabular array with the name or label of your comparison variable.
- Uses
percent = "cell"by default. - Adds row and cavalcade margin totals (customizable through the
marginargument). - Displays missing data in both row and column variables (customizable through the
missingargument).
| Characteristic | Chemotherapy Treatment | Full | p-value 1 | |
|---|---|---|---|---|
| Drug A | Drug B | |||
| T Stage | 0.nine | |||
| T1 | 28 (14%) | 25 (12%) | 53 (26%) | |
| T2 | 25 (12%) | 29 (14%) | 54 (27%) | |
| T3 | 22 (11%) | 21 (10%) | 43 (22%) | |
| T4 | 23 (12%) | 27 (fourteen%) | fifty (25%) | |
| Total | 98 (49%) | 102 (51%) | 200 (100%) | |
| 1 Pearson's Chi-squared test | ||||
Source: https://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html
0 Response to "what happens if you drop a numerical varible in the summary table to replace"
Publicar un comentario