Download Full Text (882 KB)
Data censoring occurs when researchers have only partial information about the value of a variable. For example, one study investigated depression among participants taking psilocybin (magic mushrooms). If participants took extra psilocybin outside of the study context, then the dosage is known to be at least as much as a certain value, but it might have been higher. Left censoring occurs when the left-hand side of a distribution is obscured by censoring; right censoring when the right-hand side is obscured. The R package lava can estimate the correlation that would have been obtained between the uncensored variables when provided with the data from the censored variables. We conducted a Monte Carlo study to evaluate the extent to which lava estimates are biased for data sets of 500 cases with various correlations (-.95, -.50, -.05, .25, .50, and .95) and various degrees of left censoring (10% on both variables, 50% on both, 20% on one and 80% on the other, and 95% on both). When there was low to moderate censoring, lava estimates were unbiased. However, when there was 95% censoring on both variables, lava estimates were biased. When the correlation was -.05 or -.50, bias was large and negative (-.24 or -.35, respectively). For other correlations, bias was typically moderate (e.g., -.02 to .06). If researchers are interested in negative correlations between variables that may be left censored, we recommend they minimize censoring to avoid biased estimates.
R package lava; Data censoring; Psilocybin
Medicine and Health Sciences
Cordova-Medina, Monica; Tith, LaShawn; and Ayele, Fitsum A., "Lava is All You Need: R Package Reduces Bias for Correlations among Censored Variable" (2021). Undergraduate Research Symposium Podium Presentations. 6.