Mathnotes

Hypothesis Testing

Given a baseline population mean $\mu_0,$ a sample mean $\bar{x},$ a sample count $n,$ and either a sample standard deviation $s$ or a population standard deviation $\sigma,$ we might want to know if the sample data provides sufficient evidence to conclude that the true population mean $\mu$ differs significantly from $\mu_0.$

We could say that our null hypothesis is that the population mean is what we expect:

$$ H_0: \mu = \mu_0, $$

and that the alternative hypothesis is that it is different from what we expect:

$$ H_1: \mu \neq \mu_0. $$

We'd then compute a test statistic. Here, we will assume we know the population's standard deviation $\sigma$

$$ z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}. $$

Then, since we want to know the p-value, that is, the probability of getting a test statistic ($z$) this extreme or more extreme if the actual mean is $\mu_0,$, we have a two-tailed test because the sample mean could either be lower than or higher than our actual mean. So, we do

$$ \text{p-value} = P(|Z| \geq |z|) = 2 \times P(Z \geq |z|) = 2 \left(1 - \int_{-\infty}^{|z|} \frac{1}{\sqrt{2\pi}} e^{-\frac{t^2}{2}} \, dt \right) $$

or in Mathematica notation:

$$ 2 * (1 - CDF[NormalDistribution[0, 1], Abs[z]]). $$

Low $p-values$ (typically, p < 0.05 or so) lead us to reject the null-hypothesis and adopt the alternative hypothesis. A very low $p$-value means that the probability of getting the sample we got if $\mu = \mu_0$ is very low.

Instead of hypothesising that that $\mu \neq mu_0,$ we could hypothesize that $\mu > \mu_0.$, or that $\mu < \mu_0.$ These would be single-tailed tests and we'd use, for example:

$$ H_0: \mu = \mu_0, $$

$$ H_1: \mu > \mu_0, $$

$$ 1 - CDF[NormalDistribution[0, 1], z] $$

instead.

Additionally, if we don't know $\sigma$ but only know $s,$ the standard deviation of the sample, we would use a Student T Distribution. We'd calculate a $t$ value instead of a $z$ value:

$$ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}. $$

$$ 1 - CDF[StudentTDistribution[n-1], t]. $$

Note that if we wanted to test the alternative hypothesis $H_1: \mu < \mu_0$ we'd use

$$ CDF[StudentTDistribution[n-1], t], $$

and similarly for when $\sigma$ is known, except we'd use the standard normal distribution and a $z$ value.

Interactive Demo