Statistics & Hypothesis Testing SL+HL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability

JMaths

www.jmaths.xyz

1

Chi-Squared Test for Independence

Tests whether two categorical variables are independent or associated, using observed vs expected frequencies in a contingency table.

Test statistic (given in formula booklet)

\( \chi ^2_{calc} = \sum (f_o - f_e)^2 / f_e \)

Expected frequency: \( f_{e} \) = (row total \( \times \) column total) / grand total

Degrees of freedom

\( df = (rows - 1)(columns - 1) \)

Steps: (1) State H₀: variables are independent. H₁: variables are not independent. (2) Calculate \( \chi ^2_{calc} ( \)GDC). (3) Find \( p- \)value. (4) If \( p \) < significance level \( \to \) reject H₀.

GDC: Chi-Squared Test

TI-84 Plus CE

Enter observed data into a matrix: [2ND][MATRIX] → Edit → [A]
[STAT] → TESTS \( \to \chi ^2- \)Test
Observed: [A] Expected: [B] press Calculate
Read \( \chi ^2 \) and \( p- \)value. Check [B] for expected frequencies.

TI-Nspire CX II

Open Spreadsheet, enter observed data in columns
[Menu] → Statistics → Stat Tests \( \to \chi ^2 2- \)way Test
Select data range → read \( \chi ^2 \) and \( p- \)value

Casio fx-CG50

[MENU] → Statistics → TEST → CHI → 2- \)way
Enter observed matrix → Execute
Read \( \chi ^2 \) and \( p \). Press [F6] to see expected frequencies.

✗

Common error: All expected frequencies must be \( \geq 5. \) If any \( f_{e} \) < 5, combine categories or note the limitation. The IB expects you to check this.

2

Chi-Squared Goodness of Fit HL

Tests whether observed data fits an expected distribution (uniform, given proportions, Poisson, etc.).

Same test statistic, different df

\( \chi ^2_{calc} = \sum (f_o - f_e)^2 / f_e df = k - 1 - p \)

\( k = \) number of categories, \( p = \) number of estimated parameters (0 if given)

H₀: The data follows the proposed distribution. H₁: The data does not follow the proposed distribution.

▶

Degrees of freedom: For a fair die with 6 faces, \( df = 6 - 1 = 5 \). If you estimated the mean from the data for a Poisson fit, \( df = k - 1 - 1 = k - 2 \).

Statistics & Hypothesis Testing SL+HL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability

JMaths

www.jmaths.xyz

3

Spearman’s Rank Correlation

Spearman’s r_s measures the strength of a monotonic (not necessarily linear) relationship between two ranked variables. \( -1 \leq \) \( r_{s} \) \( \leq 1. \)

Spearman’s rank coefficient (given in formula booklet)

\( r_s = 1 - (6 \sum d^2) / (n(n^2 - 1)) \)

\( d = \) difference between ranks, \( n = \) number of data pairs

▶

Tied ranks: If two values are tied, assign the average of the ranks they would occupy. e.g. tied for 3rd and 4th \( \to \) both get rank 3.5.

✗

Common error: Spearman’s tests for monotonic association, not linear. It uses ranks, not raw data. Do not confuse with Pearson’s \( r \) (which measures linear correlation).

4

Poisson Distribution HL

Models the number of events in a fixed interval when events occur independently at a constant average rate \( \lambda . \)

Probability

\( P(X = x) = e^{-\lambda } \lambda ^x / x! \)

Mean & Variance

\( E(X) = \lambda Var(X) = \lambda \)

Mean = Variance is a key Poisson property

Conditions: Events occur singly, independently, at a constant mean rate, and with no upper limit.

GDC: Poisson Probabilities

TI-84 Plus CE

[2ND][VARS] (DISTR)
\( poissonpdf(\lambda , x) \) → P(\( X = x) \)
\( poissoncdf(\lambda , x) \) → P(\( X \leq x) \)

TI-Nspire CX II

[Menu] → Statistics → Distributions
\( Poisson Pdf(\lambda , x) \) → P(\( X = x) \)
\( Poisson Cdf(lower, upper, \lambda ) \) → P(lower \( \leq X \leq \) upper)

Casio fx-CG50

[MENU] → Statistics → DIST → POISN
Ppd for P(\( X = x), \) Pcd for P(\( X \leq x) \)

▶

\( P(X \geq 3): \) Use the complement: P(\( X \geq 3) = 1 - \) P(\( X \leq 2) = 1 - \) poissoncdf(\( \lambda , 2). \)

5

Confidence Intervals HL

A confidence interval gives a range of plausible values for a population parameter (usually the mean \( \mu ). \)

Confidence interval for \( \mu ( \)known \( \sigma \) or large \( n) \)

\( \bar{x} - z \times \sigma /\sqrt{}n < \mu < \bar{x} + z \times \sigma /\sqrt{}n \)

90%: \( z = 1.645 \) 95%: \( z = 1.960 \) 99%: \( z = 2.576 \)

GDC: Z-Interval

TI-84 Plus CE

[STAT] → TESTS → ZInterval
Input: Stats \( \sigma = \) known \( \bar{x} = \) sample mean \( n = \) sample size
C-Level = 0.95 → Calculate

TI-Nspire CX II

[Menu] → Statistics → Confidence Intervals → z Interval
Enter \( \sigma , \bar{x}, n, \) C-Level → read interval

Casio fx-CG50

[MENU] → Statistics → INTR → Z → 1- \)Sample
Enter C-Level, \( \sigma , \bar{x}, n \to \) Execute

✗

Common error: "95% confident" does NOT mean there is a 95% probability that \( \mu \) is in the interval. It means if we repeated the sampling, 95% of intervals would contain \( \mu . \)

Statistics & Hypothesis Testing SL+HL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability

JMaths

www.jmaths.xyz

6

Type I & Type II Errors HL

Type I Error

Rejecting H₀ when H₀ is true

“False positive” — probability = significance level \( \alpha \)

Type II Error

Failing to reject H₀ when H₀ is false

“False negative” — probability = \( \beta \)

Power = 1 \( - \beta = \) probability of correctly rejecting a false H₀. Higher power is better.

▶

Memory aid: Type I = false positIve (I for “Innocent person convicted”). Type II = false negatIIve (“Guilty person goes free”).

✗

Common error: You can NEVER “accept H₀”. The correct phrasing is “fail to reject H₀” or “insufficient evidence to reject H₀”.

✗

Common error: Decreasing \( \alpha (e.g. \) from 5% to 1%) reduces Type I errors but \( increases \) Type II errors. There is always a trade-off.

7

Exam Traps & Key Reminders

✗

State hypotheses in context. Do not write generic H₀/H₁. e.g. “H₀: Grade and gender are independent” not just “H₀: the variables are independent”.

✗

\( Expected frequencies \geq 5. \) State this check explicitly in \( \chi ^2 \) tests. If violated, combine adjacent categories.

✗

\( Compare p-value to \alpha , not \chi ^2 to \alpha . \) Write “\( p = 0.023 \) < 0.05 \( \to \) reject H₀”. The IB awards marks for this comparison step.

▶

Conclusion in context: After rejecting or failing to reject, state the conclusion in the language of the problem. “There is sufficient evidence at the 5% level that grade and gender are not independent.”

▶

Default significance level: If the question does not specify, use \( \alpha = 5% \) (0.05). This is standard in the IB.

▶

Formula booklet: The \( \chi ^2 \) statistic formula, Spearman’s \( r_{s} \) formula, Poisson probability formula, and confidence interval formula are all given. Know when to use each one.