Menu
Sign in

Statistics & Hypothesis Testing SL+HL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability
www.jmaths.xyz
1
Chi-Squared Test for Independence
Tests whether two categorical variables are independent or associated, using observed vs expected frequencies in a contingency table.
Test statistic (given in formula booklet)
\( \chi ^2_{calc} = \sum (f_o - f_e)^2 / f_e \)
Expected frequency: \( f_{e} \) = (row total \( \times \) column total) / grand total
Degrees of freedom
\( df = (rows - 1)(columns - 1) \)
Steps: (1) State H0: variables are independent. H1: variables are not independent. (2) Calculate \( \chi ^2_{calc} \) (GDC). (3) Find the \( p \)-value. (4) If \( p \) < significance level \( \to \) reject H0.
χ2crit reject H0 fail to reject χ2
GDC: Chi-Squared Test
TI-84 Plus CE
Enter observed data into a matrix: [2ND][MATRIX] → Edit → [A]
[STAT] → TESTS → \( \chi ^2 \)-Test
Observed: [A]    Expected: [B]    press Calculate
Read \( \chi ^2 \) and the \( p \)-value. Check [B] for expected frequencies.
TI-Nspire CX II
Store the observed counts as a matrix (Calculator page: enter the table as a matrix → store to a variable, e.g. obs)
[Menu] → Statistics → Stat Tests → \( \chi ^2 \) 2-way Test
Observed Matrix: obs → read \( \chi ^2 \) and the \( p \)-value
Casio fx-CG50
[MENU] → Statistics → TEST → CHI → 2WAY
Enter observed matrix → Execute
Read \( \chi ^2 \) and \( p \). Press [F6] to see expected frequencies.
Common error: All expected frequencies must be \( \geq 5. \) If any \( f_{e} \) < 5, combine categories or note the limitation. The IB expects you to check this.
Worked example: \( \chi ^2 \) test for independence
A survey records preferred sport by gender (2 × 3 table). \( \chi ^2 \)-Test on the GDC gives \( \chi ^2_{calc} = 7.84 \), \( df = (2-1)(3-1) = 2 \), \( p = 0.020 \). Test at the 5% level.
H0: sport preference is independent of gender. H1: they are not independent.
Compare: \( p = 0.020 < 0.05 \).
Reject H0: there is evidence at the 5% level that sport preference depends on gender.
2
Chi-Squared Goodness of Fit SL
Tests whether observed data fits an expected distribution (uniform, given proportions, Poisson, etc.).
Test statistic (same as independence)
\( \chi ^2_{calc} = \sum (f_o - f_e)^2 / f_e \)
Degrees of freedom
\( df = k - 1 - p \)
\( k = \) number of categories, \( p = \) number of estimated parameters (0 if given)
H0: The data follows the proposed distribution.   H1: The data does not follow the proposed distribution.
Degrees of freedom: For a fair die with 6 faces, \( df = 6 - 1 = 5 \). If you estimated the mean from the data for a Poisson fit, \( df = k - 1 - 1 = k - 2 \).
HL note: The GOF test itself is SL. Only the extra rigour of grouping numerical data into classes and choosing \( df \) when parameters are estimated from the data (4.12) is HL.

Statistics & Hypothesis Testing SL+HL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability
www.jmaths.xyz
3
\( t \)-Test for Two Means SL
Tests whether two population means differ, using two samples drawn from normal populations. The GDC returns \( t \) and the \( p \)-value directly.
Hypotheses
\( H_0: \mu_1 = \mu_2 \)
\( H_1: \mu_1 \neq \mu_2 \) (two-tailed), or \( \mu_1 > \mu_2 \) / \( \mu_1 < \mu_2 \) (one-tailed)
Decision: compare the \( p \)-value to the significance level \( \alpha \). If \( p < \alpha \to \) reject H0 (the means differ).
GDC: 2-Sample \( t \)-Test
TI-84 Plus CE
[STAT] → TESTS → 2-SampTTest
Choose Data or Stats; set \( \mu_1 \): \( \neq \mu_2 \) / \( < \mu_2 \) / \( > \mu_2 \)
Pooled: No (IB default unless told variances are equal) → Calculate → read \( t \) and \( p \)
TI-Nspire CX II
[Menu] → Statistics → Stat Tests → 2-Sample \( t \) Test
Choose Data or Stats; set the alternative tail; Pooled: No
Read \( t \) and the \( p \)-value
Casio fx-CG50
[MENU] → Statistics → TEST → t → 2-Sample
Set the alternative (\( \neq \), \( < \), \( > \)); Pooled: Off
Execute → read \( t \) and \( p \)
Worked example: \( t \)-test
Two classes sit the same test. 2-SampTTest (Pooled: No, two-tailed) gives \( t = 2.31 \), \( p = 0.028 \). Test at the 5% level whether the mean scores differ.
H0: \( \mu_1 = \mu_2 \).   H1: \( \mu_1 \neq \mu_2 \).
Compare: \( p = 0.028 < 0.05 \).
Reject H0: evidence at the 5% level that the two class means differ.
Common error: State the correct tail from \( H_1 \). Use Pooled: No unless the question explicitly says the population variances are equal — the IB default is non-pooled.
4
One- & Two-Tailed Tests SL
The alternative hypothesis \( H_1 \) decides the tail. This controls how the \( p \)-value is read off the GDC.
Two-tailed
\( H_1: \mu \neq \mu_0 \)
Tests for any difference (either direction).
One-tailed
\( H_1: \mu > \mu_0 \) or \( \mu < \mu_0 \)
Tests for a difference in one specified direction only.
Set the tail on the GDC. Choose \( \neq \), \( < \) or \( > \) to match \( H_1 \) so the calculator returns the correct \( p \)-value. The \( \chi ^2 \) test is always effectively one comparison: read \( p \) directly and compare to \( \alpha \).

Statistics & Hypothesis Testing SL+HL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability
www.jmaths.xyz
5
Spearman’s Rank Correlation
Spearman’s \( r_s \) measures the strength of a monotonic (not necessarily linear) relationship between two ranked variables. \( -1 \leq r_s \leq 1 \).
Spearman’s rank coefficient (given in formula booklet)
\( r_s = 1 - (6 \sum d^2) / (n(n^2 - 1)) \)
\( d = \) difference between ranks, \( n = \) number of data pairs
Tied ranks: If two values are tied, assign the average of the ranks they would occupy. e.g. tied for 3rd and 4th \( \to \) both get rank 3.5.
Common error: Spearman’s tests for monotonic association, not linear. It uses ranks, not raw data. Do not confuse with Pearson’s \( r \) (which measures linear correlation).
GDC shortcut: Rank each variable into two lists, then run PMCC / LinReg on the ranks: the value of \( r \) on the ranked data equals \( r_s \). Faster than the \( \sum d^2 \) formula for larger \( n \).
Worked example: Spearman’s \( r_s \)
Five products are ranked by price and by quality. The rank differences are \( d = 1, -1, 0, 2, -2 \), so \( \sum d^2 = 1 + 1 + 0 + 4 + 4 = 10 \), with \( n = 5 \).
\( r_s = 1 - \dfrac{6 \times 10}{5(5^2 - 1)} = 1 - \dfrac{60}{120} \).
\( r_s = 0.5 \): a moderate positive monotonic association between price and quality.
6
Poisson Distribution HL
Models the number of events in a fixed interval when events occur independently at a constant average rate \( \lambda . \)
Probability
\( P(X = x) = e^{-\lambda } \lambda ^x / x! \)
Mean & Variance
\( E(X) = \lambda , \quad \text{Var}(X) = \lambda \)
Mean = Variance is a key Poisson property
Conditions: Events occur singly, independently, at a constant mean rate, and with no upper limit.
Sum of independent Poissons: if \( X \sim Po(\lambda_1) \) and \( Y \sim Po(\lambda_2) \) are independent, then \( X + Y \sim Po(\lambda_1 + \lambda_2) \).
GDC: Poisson Probabilities
TI-84 Plus CE
[2ND][VARS] (DISTR)
\( poissonpdf(\lambda , x) \) → P(\( X = x) \)
\( poissoncdf(\lambda , x) \) → P(\( X \leq x) \)
TI-Nspire CX II
[Menu] → Statistics → Distributions
\( Poisson Pdf(\lambda , x) \) → P(\( X = x) \)
\( Poisson Cdf(lower, upper, \lambda ) \) → P(lower \( \leq X \leq \) upper)
Casio fx-CG50
[MENU] → Statistics → DIST → POISN
Ppd for P(\( X = x), \)   Pcd for P(\( X \leq x) \)
\( P(X \geq 3) \): Use the complement: \( P(X \geq 3) = 1 - P(X \leq 2) = 1 - \) poissoncdf(\( \lambda , 2 \)).
Worked example: Poisson
Calls arrive at \( \lambda = 4 \) per hour. Find \( P(X \geq 2) \) in one hour.
\( P(X \geq 2) = 1 - P(X \leq 1) = 1 - \) poissoncdf\( (4, 1) \).
\( = 1 - 0.0916 = 0.908 \) (3 s.f.).
7
Confidence Interval for the Mean HL
A confidence interval for the mean of a normal population gives a range of plausible values for \( \mu \).
Interval for \( \mu \) (known \( \sigma \), or large \( n \))
\( \bar{x} - z \times \dfrac{\sigma}{\sqrt{n}} < \mu < \bar{x} + z \times \dfrac{\sigma}{\sqrt{n}} \)
90%: \( z = 1.645 \)    95%: \( z = 1.960 \)    99%: \( z = 2.576 \)
\( \sigma \) unknown? Use the \( t \)-interval (GDC TI-84 STAT → TESTS → TInterval, input \( \bar{x}, s_x, n \), C-Level). Use the \( z \)-interval only when \( \sigma \) is given.
GDC: Z-Interval
TI-84 Plus CE
[STAT] → TESTS → ZInterval
Input: Stats    \( \sigma = \) known    \( \bar{x} = \) sample mean    \( n = \) sample size
C-Level = 0.95 → Calculate
TI-Nspire CX II
[Menu] → Statistics → Confidence Intervals → z Interval
Enter \( \sigma , \bar{x}, n, \) C-Level → read interval
Casio fx-CG50
[MENU] → Statistics → INTR → Z → 1-Sample
Enter C-Level, \( \sigma , \bar{x}, n \) → Execute
Common error: "95% confident" does NOT mean there is a 95% probability that \( \mu \) is in the interval. It means if we repeated the sampling, 95% of intervals would contain \( \mu . \)

Statistics & Hypothesis Testing SL+HL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability
www.jmaths.xyz
8
Critical-Value Tests HL
Define a critical region from the sampling distribution under H0; reject H0 if the test statistic falls in it. Equivalent to rejecting when \( p < \alpha \).
Mean of a normal population
\( H_0: \mu = \mu_0 \)
GDC Z-Test (\( \sigma \) known) or T-Test (\( \sigma \) unknown) → read \( p \), reject if \( p < \alpha \).
Proportion (binomial)
Under \( H_0 \), \( X \sim B(n, p_0) \)
One-tailed \( p \)-value \( = P(X \geq x) \) or \( P(X \leq x) \) via binomcdf; reject if \( < \alpha \).
Mean (Poisson)
Under \( H_0 \), \( X \sim Po(\lambda_0) \)
\( p \)-value from poissoncdf in the appropriate tail; reject if \( < \alpha \).
Correlation \( \rho = 0 \) (bivariate normal)
\( H_0: \rho = 0 \)
GDC LinRegTTest on the bivariate data → read \( p \); reject “no linear correlation” if \( p < \alpha \).
Critical region vs \( p \)-value: the two approaches agree. The critical value is the boundary of the rejection region; a statistic beyond it is exactly the case \( p < \alpha \).
Common error: Match the tail of the binomial/Poisson \( p \)-value to \( H_1 \). For an upper-tail test use \( P(X \geq x) = 1 - P(X \leq x-1) \), not \( 1 - P(X \leq x) \).

Statistics & Hypothesis Testing SL+HL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability
www.jmaths.xyz
9
Type I & Type II Errors HL
Type I Error
Rejecting H0 when H0 is true
“False positive” — probability = significance level \( \alpha \)
Type II Error
Failing to reject H0 when H0 is false
“False negative” — probability = \( \beta \)
H0 true H1 true α β critical value Type I = α Type II = β
Power \( = 1 - \beta = \) probability of correctly rejecting a false H0. Higher power is better.
Memory aid: Type I = false positIve (I for “Innocent person convicted”). Type II = false negatIIve (“Guilty person goes free”).
Common error: You can NEVER “accept H0”. The correct phrasing is “fail to reject H0” or “insufficient evidence to reject H0”.
Common error: Decreasing \( \alpha \) (e.g. from 5% to 1%) reduces Type I errors but increases Type II errors. There is always a trade-off.
10
Exam Traps & Key Reminders
State hypotheses in context. Do not write generic H0/H1. e.g. “H0: Grade and gender are independent” not just “H0: the variables are independent”.
Expected frequencies \( \geq 5 \). State this check explicitly in \( \chi ^2 \) tests. If violated, combine adjacent categories.
Compare the \( p \)-value to \( \alpha \), not \( \chi ^2 \) to \( \alpha \). Write “\( p = 0.023 < 0.05 \) → reject H0”. The IB awards marks for this comparison step.
Conclusion in context: After rejecting or failing to reject, state the conclusion in the language of the problem. “There is sufficient evidence at the 5% level that grade and gender are not independent.”
Default significance level: If the question does not specify, use \( \alpha = 0.05 \) (5%). This is standard in the IB.
Formula booklet: The \( \chi ^2 \) statistic formula, Spearman’s \( r_{s} \) formula, Poisson probability formula, and confidence interval formula are all given. Know when to use each one.