Menu
EARLY ACCESS JMaths is in active development — new content added regularly

Statistics & Data SL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability
www.jmaths.xyz
1
Measures of Central Tendency
Mean
\( \bar{x} = \Sigma x / n \)
For grouped data: \( \bar{x} = \Sigma fx / \Sigma f \) (use midpoints)
Median
Middle value when data is ordered.
If \( n \) even: average of the two middle values
Mode
Most frequently occurring value. A data set can have no mode, one mode, or multiple modes.
When to use which: Mean is affected by outliers. Median is better for skewed data. Mode is useful for categorical data. The IB often asks you to \( justify \) which measure is most appropriate.
2
Measures of Spread
Range
\( Range = max - min \)
Interquartile range (IQR)
\( IQR = Q_3 - Q_1 \)
Middle 50% of data. Not affected by outliers.
Standard deviation
Use GDC to calculate. Measures how spread out data is from the mean.
\( \sigma \) (population) or \( s \) (sample). IB typically uses \( \sigma \)\( x \) or \( s_{x} \) from GDC.
Outlier test: A value is an outlier if it is more than 1.5 \( \times \) IQR below Q1 or above Q3. E.g. outlier if value < Q1 \( - 1.5 \times \) IQR or value > Q3 + 1.5 \( \times \) IQR.
TI-84 Plus CE — 1-Var Stats
Enter data in [STAT] → Edit (L1, optionally L2 for frequencies)
[STAT] → CALC → 1: 1-Var Stats → L1 (,L2 if freq)
Gives: \( \bar{x}, \Sigma x, \sigma x, \) Sx, n, minX, Q1, Med, Q3, maxX
TI-Nspire CX II — 1-Var Stats
Lists & Spreadsheet → enter data in column A (freq in column B if needed)
[Menu] → Statistics → Stat Calculations → One-Variable Statistics
Gives: \( \bar{x}, \sigma x, \) Sx, n, Q1, median, Q3, etc.
Casio fx-CG50 — 1-Var Stats
[MENU] → Statistics → enter data in List 1 (freq in List 2)
[CALC] (F2) → 1-VAR → set lists
Gives: \( \bar{x}, \sigma x, \) Sx, n, minX, Q1, Med, Q3, maxX
3
Data Displays
Know how to read and draw: box plots, histograms, cumulative frequency curves, and frequency tables.
Box plot (5-number summary)
min, Q1, median, Q3, max
Box = IQR; whiskers to min/max (or to fence if showing outliers)
Cumulative frequency
Plot upper boundary vs cumulative frequency.
Read off: median at \( n/2, \) Q1 at \( n/4, \) Q3 at 3\( n/4 \)
min Q1 med Q3 max IQR
Histogram vs bar chart: Histograms show continuous data with no gaps between bars. The area of each bar represents frequency. Bar charts show \( discrete/categorical \) data with gaps.

Statistics & Data SL

IB Mathematics: Applications & Interpretation · Topic 4: Statistics & Probability
www.jmaths.xyz
4
Frequency Tables & Grouped Data
For grouped data, use the midpoint of each class to estimate the mean: midpoint = (lower + upper) / 2.
Worked Example
A grouped frequency table shows: 10–20 (freq 5), 20–30 (freq 12), 30–40 (freq 8). Estimate the mean.
Midpoints: 15, 25, 35
\( \Sigma fx = 5(15) + 12(25) + 8(35) = 75 + 300 + 280 = 655 \)
\( \Sigma f = 5 + 12 + 8 = 25 \)
\( \bar{x} = 655 / 25 = 26.2 \)
Answer: Estimated mean = 26.2
Common error: Forgetting to use midpoints for grouped data. The mean of "10–20" is NOT 10 or 20 — it is 15. Also check whether boundaries are 10 \( \leq x \) < 20 or 10–19.
5
Correlation & Regression
The Pearson correlation coefficient (\( r) \) measures the strength and direction of a linear relationship between two variables.
Interpreting \( r \)
\( r = 1 \): perfect positive linear    \( r = -1 \): perfect negative linear    \( r = 0 \): no linear relationship
|\( r| \) > 0.75 strong    0.5 < |\( r| \) < 0.75 moderate    |\( r| \) < 0.5 weak
Regression line (\( y \) on \( x) \)
\( y = ax + b \)
Use to predict \( y \) from \( x \). The line passes through (\( \bar{x}, \bar{y}). \)
x y positive correlation, r close to 1 y = ax + b
TI-84 Plus CE — LinReg
Enter data: [STAT] → Edit (x in L1, y in L2)
[STAT] → CALC → 4: LinReg(ax+b) L1, L2
Gives: \( a \) (gradient), \( b \) (intercept), \( r, r^2 \)
(If \( r \) not shown: [2nd][CATALOG] → DiagnosticOn)
TI-Nspire CX II — LinReg
Enter data in Lists & Spreadsheet (x in col A, y in col B)
[Menu] → Statistics → Stat Calculations → Linear Regression (mx+b)
Gives: \( m, b, r, r^2 \)
Casio fx-CG50 — LinReg
[MENU] → Statistics → enter x in List 1, y in List 2
[CALC] (F2) → REG → X (linear)
Gives: \( a, b, r, r^2 \)
Worked Example
GDC gives the regression line \( y = 2.3x + 4.1 \) with \( r = 0.94 \) for data where 5 \( \leq x \leq 30. \) Predict \( y \) when \( x = 20 \) and comment on reliability.
\( y = 2.3(20) + 4.1 = 50.1 \)
\( r = 0.94 \) shows a strong positive linear correlation.
\( x = 20 \) is within the data range (interpolation), so the prediction is reliable.
Answer: \( y = 50.1 \); reliable because strong correlation and interpolation.
6
Interpolation vs Extrapolation
Interpolation
Predicting within the data range.
Generally reliable if \( r \) is strong.
Extrapolation
Predicting outside the data range.
Less reliable — the relationship may not hold.
Common error: Using the regression line to extrapolate far beyond the data and stating the prediction is reliable. Always check whether the \( x- \)value is within the original data range.
Common error: Confusing correlation with causation. A strong \( r \) does NOT mean one variable \( causes \) the other. There may be a third variable or coincidence.
7
Exam Reminders
Describing distributions: Comment on shape (symmetric, positively/negatively skewed), centre (mean or median), and spread (range, IQR, or standard deviation).
Comparing data sets: Compare a measure of centre AND a measure of spread. E.g. "Group A has a higher mean (25.3 vs 19.1) but a larger standard deviation (4.2 vs 2.8), so it is more spread out."
Formula booklet: The formulae for mean, standard deviation, and \( r \) are given but you should use your GDC for calculations. The formulae help you understand what the statistics measure.