Model Construction

ANOVA

Adjusted means calculation

Contrast calculation

Model Construction

ANOVA

Adjusted means calculation

Contrast calculation

Survival curve creation

Cox Regression

Log Rank

Statistical Analysis

Project for Creating an Automated Statistical Testing Platform

Introduction

The goal of this platform is to automate the statistical processes for studying the differences between multiple groups. Before explaining the tests performed, we will review the basic concepts needed to interpret these tests.

Prerequisites

  • Null Hypothesis (H0): The initial hypothesis to be tested. It is generally formulated as an equality or no significant difference between the groups.
  • Alternative Hypothesis (H1 or Ha): The hypothesis you aim to prove, typically formulated as a significant difference or inequality between the groups.
  • Significance Level (α): The probability threshold below which you reject the null hypothesis. It is often set at 0.05, meaning you accept a 5% risk of making a Type I error.
  • P-value: The probability of obtaining results as extreme as those observed if the null hypothesis is true. A low p-value (generally < α) suggests that you can reject the null hypothesis.

The Tests

Initially, the selected tests are:

  • ANOVA
  • Mixed Models
  • Kaplan-Meier Survival Curves
  • Cox Regression
  • Log Rank Test

ANOVA

ANOVA is a parametric test used to determine if there is a significant difference between two or more groups for a given measurement. Group formation is done using categorical variables. The null hypothesis (H0) for this test is the equality of all groups (no significant difference), while the alternative hypothesis (H1) is that at least one group differs from the others. If the p-value is below our significance level, we reject the null hypothesis, and we can perform a Tukey test (Emmeans) to estimate the means of each group and determine which ones are different.

Example

In a study measuring PIE, the dependent variable would be PIE, and the explanatory variables could be the product and time (0H, 6H, etc.).

Emmeans

The Tukey test is used after rejecting the null hypothesis in an ANOVA. It helps determine which groups differ from each other.

Mixed Models

Mixed models combine ANOVA and the Tukey test, allowing us to detect if there are differences between multiple groups within a population and measure those differences if they exist. Unlike ANOVA, it relies on two types of effects:

  • Fixed Effects (the focus of the study)
  • Random Effects (not the focus of the study but can influence the results)

We assume that fixed effects react similarly across all individuals, while variations are captured by random effects.

Example

In a study measuring PIE, the dependent variable would be PIE. The fixed effects could be the product and time (0H, 6H, etc.), and the panelist would be a random effect. In a test studying the cellular response to different products, the dependent variable could be collagen production, the product concentration could be a fixed effect, and the cell line could be a random effect.

Kaplan-Meier Survival Curves

The Kaplan-Meier method is used to estimate survival probabilities over time. It allows for visualizing survival data and helps determine if different groups (e.g., treatment vs. control) exhibit different survival patterns over time.

This method is widely applied in clinical studies where the outcome of interest is time to event (such as time to death, disease recurrence, or other outcomes). The Kaplan-Meier curve provides an intuitive representation of survival data and can be compared across different groups.

Example

In a study measuring the longevity of lipstick wear, the event could be defined as the point when the lipstick wears off. Different curves can represent different lipstick products, allowing for comparison.

Cox Proportional Hazards Model

The Cox regression model, also known as the proportional hazards model, is used to assess the effect of multiple variables on survival time. Unlike Kaplan-Meier, which focuses on one group or treatment at a time, Cox regression allows for the inclusion of covariates and the examination of their influence on survival while adjusting for other factors.

This model assumes proportional hazards, meaning the effect of the covariates on the hazard (risk of event occurrence) is constant over time.

Example

In a study on the long-lasting effect of lipstick, factors such as product type, application technique, and environmental conditions could be included in the model to assess their effect on how long the lipstick stays on.

Log Rank Test

The Log Rank test is used to compare the survival distributions of two or more groups. It is a non-parametric test and can be used alongside Kaplan-Meier curves to determine whether there is a statistically significant difference between the survival curves of different groups.

The null hypothesis for the Log Rank test is that the survival curves are equal across the groups being compared.

Example

In a comparison of different lipstick brands, the Log Rank test could be used to determine if there is a significant difference in the longevity of wear across the different brands.