Statistical Hypothesis Testing

Table of Contents

Short Summary: A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.

Z-test

The Z-test is a statistical test to determine whether two population means are different when the variances are known and the sample size is large (\(>\) 30). The reason is that the Z-test assumes that the data follows an approximately normal distribution and under the central limit theorem, as the number of samples gets larger, the samples are considered to be approximately normally distributed.

\begin{equation} \text{Z} = \frac{X - \mu}{\sigma} \end{equation}

One-Sample t-test

The one-sample t-test is a statistical hypothesis test used to determine whether an unknown population mean is different from a specific value. In testing the null hypothesis that the sample mean is equal to a specified value \(H_0: \mu = \mu_0\), one uses the following statistic:

\begin{equation} t = \frac{\bar{x} - \mu_{0}}{\sqrt{\frac{s^2}{n}}} \end{equation}

Two-Sample t-test

Given two groups, two-sample t-test is only applicable when the two sample sizes (that is, the number \(n\) of participants of each group) are equal and the two distributions have the same variance:

\begin{equation} t = \frac{\bar{X_1} - \bar{X_2}}{\sqrt{\frac{s_{X_1}^2}{n} + \frac{s_{X_2}^2}{n}}} \end{equation}

Paired t-test

Paired t-tests are a form of blocking, and have greater power (probability of avoiding a type II error, also known as a false negative) than unpaired tests when the paired units are similar with respect to "noise factors" that are independent of membership in the two groups being compared. In a different context, paired t-tests can be used to reduce the effects of confounding factors in an observational study. Paired t-test is very similar to one-sample t-test as the way to perform it is to first calculate the difference \(X_D = x_1 - x_2\) and then perform a one-sample t-test on the result:

\begin{equation} t = \frac{X_D - \mu_0}{\sqrt{\frac{s_D^2}{n}}} \end{equation}

\(\chi^2\) Test

Pearson's chi-squared test is used to determine whether there is a statistically significant difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table.

\begin{equation} \chi^2 = \sum_i \frac{(O_i - E_i)^2}{E_i} \end{equation}