Statistics & Hypothesis Testing SL+HL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability

JMaths

www.jmaths.xyz

1

Chi-Squared Test for Independence

Tests whether two categorical variables are independent or associated, using observed vs expected frequencies in a contingency table.

Test statistic (given in formula booklet)

\( \chi ^2_{calc} = \sum (f_o - f_e)^2 / f_e \)

Expected frequency: \( f_{e} \) = (row total \( \times \) column total) / grand total

Degrees of freedom

\( df = (rows - 1)(columns - 1) \)

Steps: (1) State H₀: variables are independent. H₁: variables are not independent. (2) Calculate \( \chi ^2_{calc} \) (GDC). (3) Find the \( p \)-value. (4) If \( p \) < significance level \( \to \) reject H₀.

GDC: Chi-Squared Test

TI-84 Plus CE

Enter observed data into a matrix: [2ND][MATRIX] → Edit → [A]
[STAT] → TESTS → \( \chi ^2 \)-Test
Observed: [A] Expected: [B] press Calculate
Read \( \chi ^2 \) and the \( p \)-value. Check [B] for expected frequencies.

TI-Nspire CX II

Store the observed counts as a matrix (Calculator page: enter the table as a matrix → store to a variable, e.g. obs)
[Menu] → Statistics → Stat Tests → \( \chi ^2 \) 2-way Test
Observed Matrix: obs → read \( \chi ^2 \) and the \( p \)-value

Casio fx-CG50

[MENU] → Statistics → TEST → CHI → 2WAY
Enter observed matrix → Execute
Read \( \chi ^2 \) and \( p \). Press [F6] to see expected frequencies.

✗

Common error: All expected frequencies must be \( \geq 5. \) If any \( f_{e} \) < 5, combine categories or note the limitation. The IB expects you to check this.

Worked example: \( \chi ^2 \) test for independence

A survey records preferred sport by gender (2 × 3 table). \( \chi ^2 \)-Test on the GDC gives \( \chi ^2_{calc} = 7.84 \), \( df = (2-1)(3-1) = 2 \), \( p = 0.020 \). Test at the 5% level.

H₀: sport preference is independent of gender. H₁: they are not independent.

Compare: \( p = 0.020 < 0.05 \).

Reject H₀: there is evidence at the 5% level that sport preference depends on gender.

2

Chi-Squared Goodness of Fit SL

Tests whether observed data fits an expected distribution (uniform, given proportions, Poisson, etc.).

Test statistic (same as independence)

\( \chi ^2_{calc} = \sum (f_o - f_e)^2 / f_e \)

Degrees of freedom

\( df = k - 1 - p \)

\( k = \) number of categories, \( p = \) number of estimated parameters (0 if given)

H₀: The data follows the proposed distribution. H₁: The data does not follow the proposed distribution.

▶

Degrees of freedom: For a fair die with 6 faces, \( df = 6 - 1 = 5 \). If you estimated the mean from the data for a Poisson fit, \( df = k - 1 - 1 = k - 2 \).

HL note: The GOF test itself is SL. Only the extra rigour of grouping numerical data into classes and choosing \( df \) when parameters are estimated from the data (4.12) is HL.

Statistics & Hypothesis Testing SL+HL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability

JMaths

www.jmaths.xyz

3

\( t \)-Test for Two Means SL

Tests whether two population means differ, using two samples drawn from normal populations. The GDC returns \( t \) and the \( p \)-value directly.

Hypotheses

\( H_0: \mu_1 = \mu_2 \)

\( H_1: \mu_1 \neq \mu_2 \) (two-tailed), or \( \mu_1 > \mu_2 \) / \( \mu_1 < \mu_2 \) (one-tailed)

Decision: compare the \( p \)-value to the significance level \( \alpha \). If \( p < \alpha \to \) reject H₀ (the means differ).

GDC: 2-Sample \( t \)-Test

TI-84 Plus CE

[STAT] → TESTS → 2-SampTTest
Choose Data or Stats; set \( \mu_1 \): \( \neq \mu_2 \) / \( < \mu_2 \) / \( > \mu_2 \)
Pooled: No (IB default unless told variances are equal) → Calculate → read \( t \) and \( p \)

TI-Nspire CX II

[Menu] → Statistics → Stat Tests → 2-Sample \( t \) Test
Choose Data or Stats; set the alternative tail; Pooled: No
Read \( t \) and the \( p \)-value

Casio fx-CG50

[MENU] → Statistics → TEST → t → 2-Sample
Set the alternative (\( \neq \), \( < \), \( > \)); Pooled: Off
Execute → read \( t \) and \( p \)

Worked example: \( t \)-test

Two classes sit the same test. 2-SampTTest (Pooled: No, two-tailed) gives \( t = 2.31 \), \( p = 0.028 \). Test at the 5% level whether the mean scores differ.

H₀: \( \mu_1 = \mu_2 \). H₁: \( \mu_1 \neq \mu_2 \).

Compare: \( p = 0.028 < 0.05 \).

Reject H₀: evidence at the 5% level that the two class means differ.

✗

Common error: State the correct tail from \( H_1 \). Use Pooled: No unless the question explicitly says the population variances are equal — the IB default is non-pooled.

4

One- & Two-Tailed Tests SL

The alternative hypothesis \( H_1 \) decides the tail. This controls how the \( p \)-value is read off the GDC.

Two-tailed

\( H_1: \mu \neq \mu_0 \)

Tests for any difference (either direction).

One-tailed

\( H_1: \mu > \mu_0 \) or \( \mu < \mu_0 \)

Tests for a difference in one specified direction only.

▶

Set the tail on the GDC. Choose \( \neq \), \( < \) or \( > \) to match \( H_1 \) so the calculator returns the correct \( p \)-value. The \( \chi ^2 \) test is always effectively one comparison: read \( p \) directly and compare to \( \alpha \).

Statistics & Hypothesis Testing SL+HL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability

JMaths

www.jmaths.xyz

5

Spearman’s Rank Correlation

Spearman’s \( r_s \) measures the strength of a monotonic (not necessarily linear) relationship between two ranked variables. \( -1 \leq r_s \leq 1 \).

Spearman’s rank coefficient (given in formula booklet)

\( r_s = 1 - (6 \sum d^2) / (n(n^2 - 1)) \)

\( d = \) difference between ranks, \( n = \) number of data pairs

▶

Tied ranks: If two values are tied, assign the average of the ranks they would occupy. e.g. tied for 3rd and 4th \( \to \) both get rank 3.5.

✗

Common error: Spearman’s tests for monotonic association, not linear. It uses ranks, not raw data. Do not confuse with Pearson’s \( r \) (which measures linear correlation).

▶

GDC shortcut: Rank each variable into two lists, then run PMCC / LinReg on the ranks: the value of \( r \) on the ranked data equals \( r_s \). Faster than the \( \sum d^2 \) formula for larger \( n \).

Worked example: Spearman’s \( r_s \)

Five products are ranked by price and by quality. The rank differences are \( d = 1, -1, 0, 2, -2 \), so \( \sum d^2 = 1 + 1 + 0 + 4 + 4 = 10 \), with \( n = 5 \).

\( r_s = 1 - \dfrac{6 \times 10}{5(5^2 - 1)} = 1 - \dfrac{60}{120} \).

\( r_s = 0.5 \): a moderate positive monotonic association between price and quality.

6

Poisson Distribution HL

Models the number of events in a fixed interval when events occur independently at a constant average rate \( \lambda . \)

Probability

\( P(X = x) = e^{-\lambda } \lambda ^x / x! \)

Mean & Variance

\( E(X) = \lambda , \quad \text{Var}(X) = \lambda \)

Mean = Variance is a key Poisson property

Conditions: Events occur singly, independently, at a constant mean rate, and with no upper limit.

Sum of independent Poissons: if \( X \sim Po(\lambda_1) \) and \( Y \sim Po(\lambda_2) \) are independent, then \( X + Y \sim Po(\lambda_1 + \lambda_2) \).

GDC: Poisson Probabilities

TI-84 Plus CE

[2ND][VARS] (DISTR)
\( poissonpdf(\lambda , x) \) → P(\( X = x) \)
\( poissoncdf(\lambda , x) \) → P(\( X \leq x) \)

TI-Nspire CX II

[Menu] → Statistics → Distributions
\( Poisson Pdf(\lambda , x) \) → P(\( X = x) \)
\( Poisson Cdf(lower, upper, \lambda ) \) → P(lower \( \leq X \leq \) upper)

Casio fx-CG50

[MENU] → Statistics → DIST → POISN
Ppd for P(\( X = x), \) Pcd for P(\( X \leq x) \)

▶

\( P(X \geq 3) \): Use the complement: \( P(X \geq 3) = 1 - P(X \leq 2) = 1 - \) poissoncdf(\( \lambda , 2 \)).

Worked example: Poisson

Calls arrive at \( \lambda = 4 \) per hour. Find \( P(X \geq 2) \) in one hour.

\( P(X \geq 2) = 1 - P(X \leq 1) = 1 - \) poissoncdf\( (4, 1) \).

\( = 1 - 0.0916 = 0.908 \) (3 s.f.).

7

Confidence Interval for the Mean HL

A confidence interval for the mean of a normal population gives a range of plausible values for \( \mu \).

Interval for \( \mu \) (known \( \sigma \), or large \( n \))

\( \bar{x} - z \times \dfrac{\sigma}{\sqrt{n}} < \mu < \bar{x} + z \times \dfrac{\sigma}{\sqrt{n}} \)

90%: \( z = 1.645 \) 95%: \( z = 1.960 \) 99%: \( z = 2.576 \)

▶

\( \sigma \) unknown? Use the \( t \)-interval (GDC TI-84 STAT → TESTS → TInterval, input \( \bar{x}, s_x, n \), C-Level). Use the \( z \)-interval only when \( \sigma \) is given.

GDC: Z-Interval

TI-84 Plus CE

[STAT] → TESTS → ZInterval
Input: Stats \( \sigma = \) known \( \bar{x} = \) sample mean \( n = \) sample size
C-Level = 0.95 → Calculate

TI-Nspire CX II

[Menu] → Statistics → Confidence Intervals → z Interval
Enter \( \sigma , \bar{x}, n, \) C-Level → read interval

Casio fx-CG50

[MENU] → Statistics → INTR → Z → 1-Sample
Enter C-Level, \( \sigma , \bar{x}, n \) → Execute

✗

Common error: "95% confident" does NOT mean there is a 95% probability that \( \mu \) is in the interval. It means if we repeated the sampling, 95% of intervals would contain \( \mu . \)

Statistics & Hypothesis Testing SL+HL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability

JMaths

www.jmaths.xyz

8

Critical-Value Tests HL

Define a critical region from the sampling distribution under H₀; reject H₀ if the test statistic falls in it. Equivalent to rejecting when \( p < \alpha \).

Mean of a normal population

\( H_0: \mu = \mu_0 \)

GDC Z-Test (\( \sigma \) known) or T-Test (\( \sigma \) unknown) → read \( p \), reject if \( p < \alpha \).

Proportion (binomial)

Under \( H_0 \), \( X \sim B(n, p_0) \)

One-tailed \( p \)-value \( = P(X \geq x) \) or \( P(X \leq x) \) via binomcdf; reject if \( < \alpha \).

Mean (Poisson)

Under \( H_0 \), \( X \sim Po(\lambda_0) \)

\( p \)-value from poissoncdf in the appropriate tail; reject if \( < \alpha \).

Correlation \( \rho = 0 \) (bivariate normal)

\( H_0: \rho = 0 \)

GDC LinRegTTest on the bivariate data → read \( p \); reject “no linear correlation” if \( p < \alpha \).

▶

Critical region vs \( p \)-value: the two approaches agree. The critical value is the boundary of the rejection region; a statistic beyond it is exactly the case \( p < \alpha \).

✗

Common error: Match the tail of the binomial/Poisson \( p \)-value to \( H_1 \). For an upper-tail test use \( P(X \geq x) = 1 - P(X \leq x-1) \), not \( 1 - P(X \leq x) \).

Statistics & Hypothesis Testing SL+HL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability

JMaths

www.jmaths.xyz

9

Type I & Type II Errors HL

Type I Error

Rejecting H₀ when H₀ is true

“False positive” — probability = significance level \( \alpha \)

Type II Error

Failing to reject H₀ when H₀ is false

“False negative” — probability = \( \beta \)

Power \( = 1 - \beta = \) probability of correctly rejecting a false H₀. Higher power is better.

▶

Memory aid: Type I = false positIve (I for “Innocent person convicted”). Type II = false negatIIve (“Guilty person goes free”).

✗

Common error: You can NEVER “accept H₀”. The correct phrasing is “fail to reject H₀” or “insufficient evidence to reject H₀”.

✗

Common error: Decreasing \( \alpha \) (e.g. from 5% to 1%) reduces Type I errors but increases Type II errors. There is always a trade-off.

10

Exam Traps & Key Reminders

✗

State hypotheses in context. Do not write generic H₀/H₁. e.g. “H₀: Grade and gender are independent” not just “H₀: the variables are independent”.

✗

Expected frequencies \( \geq 5 \). State this check explicitly in \( \chi ^2 \) tests. If violated, combine adjacent categories.

✗

Compare the \( p \)-value to \( \alpha \), not \( \chi ^2 \) to \( \alpha \). Write “\( p = 0.023 < 0.05 \) → reject H₀”. The IB awards marks for this comparison step.

▶

Conclusion in context: After rejecting or failing to reject, state the conclusion in the language of the problem. “There is sufficient evidence at the 5% level that grade and gender are not independent.”

▶

Default significance level: If the question does not specify, use \( \alpha = 0.05 \) (5%). This is standard in the IB.

▶

Formula booklet: The \( \chi ^2 \) statistic formula, Spearman’s \( r_{s} \) formula, Poisson probability formula, and confidence interval formula are all given. Know when to use each one.