tableone and its Basic Statistics

Baseline Characteristics in Biomedical Research

Marothi Peter LETSOALO

Learning outcomes

  • Refresh on what baseline characteristics are and why they are important.
  • Know which descriptive statistics and statistical tests are used to evaluate baseline characteristic.
  • Know what is tableone package and how it works.
  • Know how to use tableone package with a real data set.

Learning outcomes

Variable

Placebo

Treatment

Overall

P

n

23

21

44

Smoker = smoker (%)

11 (47.8)

6 (28.6)

17 (38.6)

0.317

Age (years); median (IQR)

31.0
(30.0, 33.0)

31.0
(29.0, 34.0)

31.0
(29.0, 33.2)

0.897

Education (%)

0.437

grade 10-12, matriculated

7 (30.4)

9 (42.9)

16 (36.4)

grade 10-12, not matriculated

11 (47.8)

7 (33.3)

18 (40.9)

less than grade 9

2 (8.7)

4 (19.0)

6 (13.6)

post-secondary

3 (13.0)

1 (4.8)

4 (9.1)

Sex = Female (%)

23 (100.0)

21 (100.0)

44 (100.0)

-

Nugent Score; median (IQR)

7.0
(6.5, 8.0)

6.0
(5.0, 6.0)

6.5
(5.0, 8.0)

0.001

CRP Blood; median (IQR)

1.2
(0.9, 2.4)

1.2
(0.7, 1.8)

1.2
(0.8, 2.0)

0.724

pH; median (IQR)

5.2
(4.8, 5.3)

4.6
(4.3, 4.9)

4.9
(4.6, 5.3)

0.002

IFN-Y; median (IQR)

0.3
(0.3, 0.8)

0.3
(0.3, 0.3)

0.3
(0.3, 0.5)

0.159

IL-10; median (IQR)

0.8
(0.8, 2.0)

0.8
(0.8, 1.8)

0.8
(0.8, 1.9)

0.820

Let’s do a refresh!!!

Introduction

What are baseline characteristics and why are they important?

  • Baseline characteristics describe the participants at the start of a study, for example

age, sex, disease severity, etc.*

  • They allow researchers to explore the treatment effect across different subgroups (Matthews 2006).

Methods

Which descriptive statistics and statistical tests?

  • Descriptive statistics summarize the distribution and characteristics of a variable, such as

means (standard deviations), medians (interquartile range), count (percentage)

  • Statistical tests evaluate whether there is a significant difference or association between groups or variables, such as

t-tests, anova, rank sum tests, chi-squared tests

  • Standardized mean difference (SMD) is a measure of the effect size that can compare different variables or combine results from different studies (Schulz, Altman, and Moher 2010).

R package tableone

What is tableone package and how does it work?

  • tableone is a package that simplifies the creation of “Table 1: Baseline demographics and clinical characteristics” (Yoshida and Bartel 2022).
  • It can handle both continuous and categorical variables, and provide descriptive statistics, statistical tests, and SMDs.
  • It can handle weighted data using the survey package, which allows researchers to account for complex sampling designs and adjust for confounding factors.
  • It has a simple and flexible syntax, and can produce nice-looking tables using the print or kableone function (together with flextable you get nice tables).

Load tableone package

tableone Github Site

library(tableone) # Loading/Attaching and Listing of Packages
  • CreateTableOne function
  • svyCreateTableOne function (not in the scope)
?CreateTableOne

CreateTableOne Function

  • data: A data frame in which these variables exist. All variables (both vars and strata) must be in this data frame.
  • vars: Variables to be summarized given as a character vector. Factors are handled as categorical variables, whereas numeric variables are handled as continuous variables. If empty, all variables in the data frame specified in the data argument are used.
  • strata: Stratifying (grouping) variable name(s) given as a character vector. If omitted, the overall results are returned.
  • factorVars: Numerically coded variables that should be handled as categorical variables given as a character vector. Do not include factors, unless you need to relevel them by removing empty levels. If omitted, only factors are considered categorical variables. The variables specified here must also be specified in the vars argument.
  • includeNA = FALSE: If TRUE, NA is handled as a regular factor level rather than missing. NA is shown as the last factor level in the table. Only effective for categorical variables.
  • test = TRUE: If TRUE, as in the default and there are more than two groups, groupwise comparisons are performed.
  • testApprox = chisq.test: A function used to perform the large sample approximation based tests. The default is chisq.test. This is not recommended when some of the cell have small counts like fewer than 5.
  • argsApprox = list(correct = TRUE): A named list of arguments passed to the function specified in testApprox. The default is list (correct = TRUE), which turns on the continuity correction for chisq.test.
  • testExact = fisher.test: A function used to perform the exact tests. The default is fisher.test. If the cells have large numbers, it will fail because of memory limitation. In this situation, the large sample approximation based should suffice.
  • testNormal = oneway.test: A function used to perform the normal assumption based tests. The default is oneway.test. This is equivalent of the t-test when there are only two groups.
  • argsNormal = list(var.equal = TRUE): A named list of arguments passed to the function specified in testNormal.
  • testNonNormal = kruskal.test: A function used to perform non-normal assumption based tests.
  • argsNonNormal = list(NULL): A named list of arguments passed to the function specified in testNonNormal.
  • smd = TRUE: If set to TRUE, standardized mean differences are calculated.
  • addOverall = FALSE: If set to TRUE, an overall column is added to the table.

References

Altman, Douglas G. 1985. “Comparability of Randomised Groups.” The Statistician 34 (1): 125. https://doi.org/10.2307/2987510.
Matthews, John N. S. 2006. Introduction to Randomized Controlled Clinical Trials. Chapman; Hall/CRC. https://doi.org/10.1201/9781420011302.
Schulz, K. F, D. G Altman, and D. Moher. 2010. “CONSORT 2010 Statement: Updated Guidelines for Reporting Parallel Group Randomised Trials.” BMJ 340 (mar23 1): c332–32. https://doi.org/10.1136/bmj.c332.
Yoshida, Kazuki, and Alexander Bartel. 2022. “Tableone: Create ’Table 1’ to Describe Baseline Characteristics with or Without Propensity Score Weights.” https://CRAN.R-project.org/package=tableone.