`tableone` and its Basic Statistics

Baseline Characteristics in Biomedical Research

Marothi Peter LETSOALO

Learning outcomes

Refresh on what baseline characteristics are and why they are important.

Know which descriptive statistics and statistical tests are used to evaluate baseline characteristic.

Know what is tableone package and how it works.

Know how to use tableone package with a real data set.

Learning outcomes

Variable	Placebo	Treatment	Overall	P
n	23	21	44
Smoker = smoker (%)	11 (47.8)	6 (28.6)	17 (38.6)	0.317
Age (years); median (IQR)	31.0 (30.0, 33.0)	31.0 (29.0, 34.0)	31.0 (29.0, 33.2)	0.897
Education (%)				0.437
grade 10-12, matriculated	7 (30.4)	9 (42.9)	16 (36.4)
grade 10-12, not matriculated	11 (47.8)	7 (33.3)	18 (40.9)
less than grade 9	2 (8.7)	4 (19.0)	6 (13.6)
post-secondary	3 (13.0)	1 (4.8)	4 (9.1)
Sex = Female (%)	23 (100.0)	21 (100.0)	44 (100.0)	-
Nugent Score; median (IQR)	7.0 (6.5, 8.0)	6.0 (5.0, 6.0)	6.5 (5.0, 8.0)	0.001
CRP Blood; median (IQR)	1.2 (0.9, 2.4)	1.2 (0.7, 1.8)	1.2 (0.8, 2.0)	0.724
pH; median (IQR)	5.2 (4.8, 5.3)	4.6 (4.3, 4.9)	4.9 (4.6, 5.3)	0.002
IFN-Y; median (IQR)	0.3 (0.3, 0.8)	0.3 (0.3, 0.3)	0.3 (0.3, 0.5)	0.159
IL-10; median (IQR)	0.8 (0.8, 2.0)	0.8 (0.8, 1.8)	0.8 (0.8, 1.9)	0.820

Let’s do a refresh!!!

Introduction

What are baseline characteristics and why are they important?

Baseline characteristics describe the participants at the start of a study, for example

age, sex, disease severity, etc.*

They help readers assess the validity and applicability of the study results (Schulz, Altman, and Moher 2010; Altman 1985).

They allow researchers to explore the treatment effect across different subgroups (Matthews 2006).

Methods

Which descriptive statistics and statistical tests?

Descriptive statistics summarize the distribution and characteristics of a variable, such as

means (standard deviations), medians (interquartile range), count (percentage)

Statistical tests evaluate whether there is a significant difference or association between groups or variables, such as

t-tests, anova, rank sum tests, chi-squared tests

Standardized mean difference (SMD) is a measure of the effect size that can compare different variables or combine results from different studies (Schulz, Altman, and Moher 2010).

R package tableone

What is `tableone` package and how does it work?

tableone is a package that simplifies the creation of “Table 1: Baseline demographics and clinical characteristics” (Yoshida and Bartel 2022).

It can handle both continuous and categorical variables, and provide descriptive statistics, statistical tests, and SMDs.

It can handle weighted data using the survey package, which allows researchers to account for complex sampling designs and adjust for confounding factors.

It has a simple and flexible syntax, and can produce nice-looking tables using the print or kableone function (together with flextable you get nice tables).

Load `tableone` package

tableone Github Site

library(tableone) # Loading/Attaching and Listing of Packages

CreateTableOne function
svyCreateTableOne function (not in the scope)

?CreateTableOne

data: A data frame in which these variables exist. All variables (both vars and strata) must be in this data frame.
vars: Variables to be summarized given as a character vector. Factors are handled as categorical variables, whereas numeric variables are handled as continuous variables. If empty, all variables in the data frame specified in the data argument are used.
strata: Stratifying (grouping) variable name(s) given as a character vector. If omitted, the overall results are returned.
factorVars: Numerically coded variables that should be handled as categorical variables given as a character vector. Do not include factors, unless you need to relevel them by removing empty levels. If omitted, only factors are considered categorical variables. The variables specified here must also be specified in the vars argument.
includeNA = FALSE: If TRUE, NA is handled as a regular factor level rather than missing. NA is shown as the last factor level in the table. Only effective for categorical variables.

test = TRUE: If TRUE, as in the default and there are more than two groups, groupwise comparisons are performed.

testApprox = chisq.test: A function used to perform the large sample approximation based tests. The default is chisq.test. This is not recommended when some of the cell have small counts like fewer than 5.
argsApprox = list(correct = TRUE): A named list of arguments passed to the function specified in testApprox. The default is list (correct = TRUE), which turns on the continuity correction for chisq.test.
testExact = fisher.test: A function used to perform the exact tests. The default is fisher.test. If the cells have large numbers, it will fail because of memory limitation. In this situation, the large sample approximation based should suffice.

testNormal = oneway.test: A function used to perform the normal assumption based tests. The default is oneway.test. This is equivalent of the t-test when there are only two groups.
argsNormal = list(var.equal = TRUE): A named list of arguments passed to the function specified in testNormal.
testNonNormal = kruskal.test: A function used to perform non-normal assumption based tests.
argsNonNormal = list(NULL): A named list of arguments passed to the function specified in testNonNormal.
smd = TRUE: If set to TRUE, standardized mean differences are calculated.

addOverall = FALSE: If set to TRUE, an overall column is added to the table.

`print` function for `CreateTableOne` object

Object
Simple look
Decimals
Variables
Test

x: object that you want to print.

printToggle = TRUE: If set to TRUE, the function will print the table.
quote = FALSE: If set to FALSE, the function will not quote character strings.
varLabels = FALSE: If set to TRUE, variable labels (if available) are used instead of variable names.
explain = TRUE: If set to TRUE, explanations for the statistics are printed.
noSpaces = FALSE: If set to TRUE, spaces are removed from variable names.
padColnames = FALSE: If set to TRUE, column names are padded with spaces for alignment.
dropEqual = FALSE: If set to TRUE, the equal sign is dropped from p-values.
showAllLevels = FALSE: If set to TRUE, all levels of factors are shown even if some levels have zero count.

catDigits = 1: The number of digits after the decimal point for categorical variables.
contDigits = 2: The number of digits after the decimal point for continuous variables.
pDigits = 3: The number of digits after the decimal point for p-values.
formatOptions = list(scientific = FALSE): A list of options for formatting numbers.

missing = FALSE: If set to TRUE, missing values are included in the table.
minMax = FALSE: If set to TRUE, minimum and maximum values are included in the table for continuous variables.
format = c("fp", "f", "p", "pf")[1]: The format of the table. Options include “fp” (frequency and percentage), “f” (frequency only), “p” (percentage only), and “pf” (percentage and frequency).
nonnormal = NULL: A character vector of non-normal variables. For these variables, median and IQR are reported instead of mean and SD.
cramVars = NULL: A character vector of variables for which Cramér’s V is calculated.

test = TRUE: If set to TRUE, tests are performed for differences across strata.
exact = NULL: A character vector of variables for which exact tests are performed instead of chi-squared tests.
smd = FALSE: If set to TRUE, standardized mean differences are calculated.

References

Altman, Douglas G. 1985. “Comparability of Randomised Groups.” The Statistician 34 (1): 125. https://doi.org/10.2307/2987510.

Matthews, John N. S. 2006. Introduction to Randomized Controlled Clinical Trials. Chapman; Hall/CRC. https://doi.org/10.1201/9781420011302.

Schulz, K. F, D. G Altman, and D. Moher. 2010. “CONSORT 2010 Statement: Updated Guidelines for Reporting Parallel Group Randomised Trials.” BMJ 340 (mar23 1): c332–32. https://doi.org/10.1136/bmj.c332.

Yoshida, Kazuki, and Alexander Bartel. 2022. “Tableone: Create ’Table 1’ to Describe Baseline Characteristics with or Without Propensity Score Weights.” https://CRAN.R-project.org/package=tableone.

tableone and its Basic Statistics

Learning outcomes

Learning outcomes

Let’s do a refresh!!!

Introduction

What are baseline characteristics and why are they important?

Methods

Which descriptive statistics and statistical tests?

R package tableone

What is tableone package and how does it work?

Load tableone package

CreateTableOne Function

print function for CreateTableOne object

References

`tableone` and its Basic Statistics

What is `tableone` package and how does it work?

Load `tableone` package

`print` function for `CreateTableOne` object