Quantitative Methods – L1 Prep
Learning Module 1: Time Value of Money
Time Value of Money
Time Value of Money (TVM)
The time value of money (TVM) is one of the most fundamental concepts in finance. It reflects the principle that money today is worth more than the same amount in the future because it can be invested to earn a return. TVM explains why we charge interest when lending, why we discount future cash flows when valuing investments, and why timing matters in every financial decision.
Analogy: Think of money like seeds. If you plant them today, they grow; if you hold them in your hand, they stay the same.
PV = \frac{FV}{(1 + r)^n}
where (r) is the required rate of return and (n) is the number of compounding periods.
Future Value
Future Value measures how much an amount of money today will grow to after earning interest for a certain number of periods. It’s the foundation of understanding compounding — earning “interest on interest.” The more time or the higher the interest rate, the larger your future value will be.
Formula: FV = PV × (1 + r)ⁿ
Example: $1,000 invested at 5% for 3 years → FV = 1,000 × 1.05³ = $1,157.63
Analogy: Imagine rolling a snowball downhill — each turn picks up more snow, just like each compounding period adds more interest.
FV = PV \times (1 + r)^n \quad
where (r) is the required rate of return and (n) is the number of compounding periods.
Present Value
Present Value is the reverse of future value. It tells us how much a future sum of money is worth today. This helps investors compare opportunities that pay out at different times. PV “discounts” future cash flows using the required rate of return to bring them back to today’s dollars.
PV = \frac{FV}{(1 + r)^n}
Example: The present value of $1,157.63 in 3 years at 5% is $1,000.
Analogy: PV is like rewinding a movie — you’re taking future dollars and bringing them back to where the story starts.
Required Return / Discount Rate (r)
The required return (or discount rate) represents the opportunity cost of capital — the return an investor demands to be compensated for risk and time. It’s composed of the risk-free rate plus several risk premiums for default res, liquidity risk, maturity risk and more. In general, the higher risk of an investment, the higher the return investors demand.
Breakdown:
Required Return = Nominal Risk-Free Rate + Default + Liquidity + Maturity premiums
Example: If the real risk-free rate is 2%, inflation 3%, and total premiums 2%, required return = 7%.
Analogy: Lending money to a friend — if they’re trustworthy, you charge less; if you’re unsure, you demand more return.
Nominal Risk-Free Rate = Real Risk-Free Rate + Expected Inflation
Interest Rates & Compounding
Interest can be simple—calculated only on the principal—or compounded, which means interest is earned on both the principal and previously accrued interest. Because most real-world applications involve compounding, it is critical to match the rate and the number of periods to the compounding frequency (e.g., APR ÷ 4 for quarterly compounding). Timelines are a helpful visual tool to avoid errors, and cash outflows should always be entered as negative numbers when using a financial calculator.
Compound interest grows money faster because each period’s earnings start earning returns themselves.
Example:
- Simple: $1,000 × (1 + 0.05 × 3) = $1,150
- Compound: $1,000 × (1.05)³ = $1,157.63
Analogy: Simple interest is like a flat salary. Compound interest is like getting a raise each year that builds on your new, higher salary.
Compounding Frequency
The frequency of compounding (annually, quarterly, monthly) affects how quickly your money grows. Always match your inputs — the rate and the number of periods — to the compounding frequency.
Example: An APR of 8% compounded quarterly means a 2% rate per quarter (8% ÷ 4).
Calculator Tips:
- Cash outflows (investments) should be entered as negatives.
- “Error 5” means you forgot to make an outflow negative.
Analogy: Compounding more often is like watering a plant more frequently — it grows faster (if that’s how that works).
Annual Percentage Rate / Stated Rate vs Effective Rate
The stated annual rate (APR or nominal rate) does not reflect compounding, while the effective annual rate (EAR or EFF) accounts for it. The APR (nominal rate) shows the simple yearly interest rate, while the Effective Annual Rate (EAR) captures the true annual return including compounding. The more frequent the compounding, the higher the EAR.
The EAR is calculated as:
[
EAR = \left(1 + \frac{\text{APR}}{n}\right)^n - 1
]
where (n) is the number of compounding periods per year. As compounding frequency increases, the EAR rises and approaches a limit, while the APR stays the same.
Example: 12% APR compounded monthly → EAR = (1 + 0.12/12)¹² − 1 = 12.68%
Analogy: APR is the advertised price, but EAR is what you actually pay once all the hidden “compounding extras” are included.
Annuities & Perpetuities
This is a placeholder tab content. It is important to have the necessary information in the block, but at this stage, it is just a placeholder to help you visualise how the content is displayed. Feel free to edit this with your actual content.
Ordinary Annuity:
- An ordinary annuity is a series of equal payments made at regular intervals, where each payment occurs at the end of the period. Examples include bond coupon payments or rent paid after living in an apartment for a month.
- Calculator Tip: The present value is calculated as of one period before the first payment.
- Analogy: Paying Rent at the end of the month
Annuity Due:
- An annuity due is similar to an ordinary annuity, except payments occur at the beginning of each period. This timing difference makes each payment effectively grow for one additional period, so annuity due values are slightly higher.
- Formula Adjustment: PV (Annuity Due) = PV (Ordinary Annuity) × (1 + r)
- Calculator Tip: Use the “BEG” mode when dealing with annuity due, and switch it back afterward.
- Analogy: Paying rent at move-in day instead of after your first month
Ordinary Annuity vs Annuity Due:
- In an ordinary annuity, payments happen at the end of the period — like paying for a service you’ve already received. In an annuity due, payments happen at the beginning — like prepaying for a gym membership before you start using it. The difference may seem small, but it matters in valuation.
Perpetuity:
[
PV = \frac{PMT}{r}
]- A perpetuity is an infinite stream of equal cash flows that never ends. Since the payments continue forever, the formula is simple:
- Example: A preferred share paying $5 annually with a 5% required return → PV = 5 / 0.05 = $100.
- Analogy: A perpetuity is like a magical ATM that keeps dispensing the same amount every year, forever.
Uneven Cash Flows
When cash flows are not equal, you can’t use the standard TVM buttons on your calculator. Instead, each payment must be discounted individually and then summed.
Formula: PV = CF₁/(1 + r)¹ + CF₂/(1 + r)² + …
PV = \frac{CF_1}{(1 + r)^1} + \frac{CF_2}{(1 + r)^2} + \cdots + \frac{CF_n}{(1 + r)^n}
Example: If you receive $100 in year 1, $200 in year 2, and $300 in year 3 at 5%, PV = 100/1.05 + 200/1.05² + 300/1.05³ = $545.48.
Analogy: Imagine getting different paychecks every year — to know what that’s worth today, you evaluate each separately.
Summary Table
| Concept | Key Formula / Idea |
|---|---|
| Future Value | FV = PV(1 + r)ⁿ |
| Present Value | PV = FV / (1 + r)ⁿ |
| Required Return | Risk-free + premiums |
| Simple Interest | Interest only on principal |
| Compound Interest | Interest on interest |
| Compounding | Match rate & periods |
| EAR | (1 + APR/n)ⁿ − 1 |
| Ordinary Annuity | Payments at end |
| Annuity Due | Payments at beginning |
| Perpetuity | PV = PMT / r |
| Uneven CFs | Discount each separately |
Key Takeaways
- The Time Value of Money is the foundation of all valuation in finance.
- Compounding grows value forward, while discounting brings it back.
- The frequency of compounding affects the true return (EAR).
- Understand the timing differences between ordinary annuities, annuities due, and perpetuities.
- Always align your rate, period, and sign conventions in calculations.
Common Mistakes & Calculator Tips
- Forgetting to make cash outflows negative (causes Error 5)
- Leaving calculator in “BEG” mode after annuity due problems
- Mixing compounding periods (e.g., using annual rate with monthly periods)
- Using TVM buttons for uneven cash flows
- Forgetting to “hop one period forward” for annuity due
Mastering this Module
Practice by plugging real-life situations into these formulas: saving for a trip, paying off a loan, or calculating returns on an investment.
Draw timelines to visualize cash flows. Always ask, “Am I moving money forward in time or back?”
The more you relate these ideas to everyday money decisions, the faster they’ll click.
Worked Examples
1. FV/PV Example (Real World)
You invest $10,000 today in a bond that pays 5% annually, compounded once per year, for three years:
[
FV = 10{,}000 \times (1 + 0.05)^3 = 10{,}000 \times 1.1576 = 11{,}576
]
Your money grows to $11,576 after three years. Conversely, if you are promised $11,576 three years from now and your required rate of return is 5%, the present value is $10,000 today.
2. Effective Annual Rate (EAR)
Suppose a bank advertises a 12% APR compounded monthly. The effective annual rate is:
[
EAR = \left(1 + \frac{0.12}{12}\right)^{12} - 1 = (1.01)^{12} - 1 = 0.1268 = 12.68%
]
This shows that monthly compounding increases the actual annual return from 12% to 12.68%.
3. Ordinary Annuity Example
You will receive $1,000 at the end of each year for five years. If the discount rate is 6%, the present value is:
[
PV = 1{,}000 \times \left[\frac{1 - (1 + 0.06)^{-5}}{0.06}\right] = 1{,}000 \times 4.21236 = 4{,}212.36
]
If this were an annuity due (payments start immediately), multiply by ( (1 + 0.06) ):
[
PV_{AD} = 4{,}212.36 \times 1.06 = 4{,}464. +
]4. Perpetuity Example
A preferred share pays a fixed dividend of $3 per year, and the required return is 8%. The value of the preferred share is:
[
PV = \frac{PMT}{r} = \frac{3}{0.08} = 37.50
]Learning Module 2: Organizing Visualizing and Describing Data
Understanding Types of Data – Ordinal vs Nominal
Data comes in two broad forms: numerical and categorical.
Numerical data consists of measurable or countable values. It’s split into:
- Discrete data: results from counting and can only take certain values, like the number of cars in a lot or tickets sold for a concert.
- Continuous data: can take any value within a range, like someone’s height, weight, or how long it takes to run a marathon.
Categorical data, on the other hand, describes qualities or characteristics.
- Nominal data: consists of labels or names without order (e.g., colors, city names).
- Ordinal data: has a logical order (e.g., bronze–silver–gold medals), but differences between ranks aren’t measurable.
Analogy:
Nominal data is like sorting socks by color — there’s no ranking. Ordinal data is like sorting runners by place — there’s order, but not equal distance between them.
Organizing Data for Analysis
How data is structured affects how we analyze it:
- Time Series Data: tracks one variable for one subject over equal time intervals (e.g., a stock’s daily price).
- Cross-Sectional Data: looks at one variable across many subjects at a single point in time (e.g., student test scores).
- Panel Data: combines both — multiple variables for multiple subjects over time (e.g., household income tracked yearly across families).
- Structured Data: neatly organized, repeating patterns like market data or financial statements.
- Unstructured Data: lacks a fixed format — think tweets, news, or customer reviews.
Analogy:
Time series is watching one tree grow over years; cross-sectional is comparing many trees today; panel data is tracking several trees over time.
Frequency & Distribution
A frequency distribution summarizes how often data values occur.
To create one, divide your data’s range (max–min) into equal intervals (“bins”) and count how many values fall into each.
- Absolute frequency: number of observations per bin.
- Relative frequency: each bin’s share of the total (%).
- Cumulative frequency: running total up to each bin.
Example:
If 10 students score between 70–80, and there are 100 total, the relative frequency = 10%.
Analogy:
It’s like grouping ages at a party — “how many guests are in their 20s, 30s, 40s, etc.”
Contingency & Confusion Tables
A contingency table shows how two or more variables interact.
- Joint frequencies: counts inside the table for combinations of variables.
- Marginal frequencies: totals for each row/column.
A confusion matrix is a special version used in machine learning. It compares predicted results to actual outcomes to measure accuracy.
Analogy: Think of it like comparing “what you guessed” vs. “what really happened.”
Visualizing Data
Choosing the right chart helps reveal insights quickly.
Common Visualization Types:
| Type | Description | Best For |
|---|---|---|
| Histogram | Bars show frequency of numerical data | Distribution patterns |
| Frequency Polygon | Connects midpoints of histogram bins | Comparing shapes |
| Cumulative Frequency Chart | Running total line | Cumulative trends |
| Bar Chart | Bars show categorical data | Comparing groups |
| Tree Map | Rectangles sized by category value | Hierarchies |
| Word Cloud | Common words appear larger | Text data |
| Line Chart | Connects data points over time | Trends |
| Bubble Chart | Adds a third variable via bubble size | Multi-variable time data |
| Scatter Plot | Plots relationship between two variables | Correlation |
| Heat Map | Colors represent intensity or correlation | Multi-variable comparison |
Analogy:
A histogram is like sorting candies by colour; a line chart is like watching your savings grow; a scatter plot is like seeing whether ice cream sales rise with temperature.
Choosing the Right Visualization
| Purpose | Best Visualizations |
|---|---|
| Show Relationships | Scatter plot, heat map |
| Compare Categories | Bar chart, tree map |
| Compare Over Time | Line chart, bubble chart |
| Show Numerical Distributions | Histogram, frequency polygon |
| Show Categorical Distributions | Bar chart, heat map |
| Analyze Unstructured Text | Word cloud |
Tip: Always match your visualization to your goal — don’t use a pie chart for time trends!
Measures of Central Tendency
Central tendency describes where most of your data lies — its “center.”
- Arithmetic Mean: the simple average.
- Median: middle value when data is ordered — best when outliers are present.
- Mode: most frequent value — useful for categorical data.
- Trimmed Mean: removes a set % of extreme values to reduce outlier influence.
- Winsorized Mean: reassigns outliers to the nearest remaining values.
- Weighted Mean: gives more importance to certain values, like portfolio returns.
- Geometric Mean: used for compounded growth rates, e.g., multi-year investment returns.
Formula:
[(1+r₁)(1+r₂)…(1+rₙ)]^{(1/n)} − 1Harmonic Mean: used when averaging rates, like average price per share.
Formula:
HM = \frac{N}{\sum \left( \frac{1}{x_i} \right)}
Relationship: Harmonic < Geometric < Arithmetic (when returns vary).
Analogy:
Arithmetic mean is a “standard average,” geometric is “growth over time,” harmonic is “averaging speeds or rates.”
Quantiles & Quartiles
Quantiles break data into equal parts:
- Quartiles: 4 equal parts
- Deciles: 10 equal parts
- Percentiles: 100 equal parts
The interquartile range (IQR) = Q3 − Q1 (the middle 50% of data).
To find a percentile’s position: (n+1) × y/100.
Visualization: Box-and-whisker plot — shows median, quartiles, and outliers.
Analogy:
Think of slicing a pizza into equal parts — quartiles cut it into four slices, percentiles into 100 tiny pieces.
Measures of Disbursion
Dispersion shows how spread out data is — the degree of variability.
| Measure | Description | Formula / Use |
|---|---|---|
| Range | Quick sense of spread (max minus min) | \( \text{Range} = \max(x_i) – \min(x_i) \) |
| Mean Absolute Deviation (MAD) | Average absolute distance from the mean | \( \mathrm{MAD} = \frac{1}{n}\sum_{i=1}^n \big| x_i – \bar{x} \big| \) |
| Variance | Average squared deviation |
Population: \( \displaystyle \sigma^2 = \frac{1}{N}\sum_{i=1}^N (x_i – \mu)^2 \) Sample: \( \displaystyle s^2 = \frac{1}{n – 1}\sum_{i=1}^n (x_i – \bar{x})^2 \) |
| Standard Deviation | Square root of variance — most used in finance |
Population: \( \sigma = \sqrt{\sigma^2} \) Sample: \( s = \sqrt{s^2} \) |
| Target Downside Deviation | Measures only downside volatility — focuses on losses below a target \(T\) |
One common form: \( \displaystyle \text{TDD} = \sqrt{\frac{1}{n}\sum_{i=1}^n \big(\max(0,\,T – x_i)\big)^2} \) |
| Coefficient of Variation (CV) | Risk per unit of return — useful for comparing dispersion across different means | \( \displaystyle CV = \frac{s}{\bar{x}} \) |
Analogy:
Range is like the distance between the tallest and shortest person in a room; standard deviation shows how tightly everyone’s height clusters around the average.
Shape of Distributions
The shape of a data distribution tells a story.
Skewness measures asymmetry:
- Positive skew: long right tail; a few very high values. (Mode < Median < Mean)
- Negative skew: long left tail; a few very low values. (Mean < Median < Mode)
Kurtosis measures “peakedness”:
- Leptokurtic: tall, narrow, frequent extremes (volatile returns).
- Platykurtic: flat, fewer extremes. (“Plat” = flat.)
- Excess Kurtosis: value above 3 = more extreme events than normal.
Analogy:
A negatively skewed class grade distribution means most students did well, but a few failed badly. A leptokurtic distribution is like a market with rare but extreme crashes and spikes.
Covariance and Correlation
These show how two variables move together.
- Covariance: shows direction (positive or negative), but not strength — it’s unbounded.
- Correlation: standardizes covariance between -1 and +1.
Corr(X, Y) = \frac{Cov(X, Y)} { (σₓ × σᵧ)}Interpretation:
+1 = move perfectly together
0 = no relationship
−1 = move perfectly opposite
Analogy:
If temperature and ice cream sales rise together, correlation is positive. If umbrellas and sunshine move opposite, it’s negative.
Summary Table
| Concept | Key Idea / Formula |
|---|---|
| Data Types | Numerical (discrete, continuous), Categorical (nominal, ordinal) |
| Data Structures | Time series, cross-sectional, panel, structured, unstructured |
| Frequency | Absolute, relative, cumulative |
| Visualization | Histogram, bar chart, line chart, scatter, tree map, word cloud |
| Central Tendency | Mean, median, mode, trimmed, winsorized, weighted, geometric, harmonic |
| Quantiles | Quartiles, deciles, percentiles, IQR, box plot |
| Dispersion | Range, MAD, variance, stdev, CV, downside deviation |
| Shape | Skewness & kurtosis |
| Relationships | Covariance, correlation |
Key Takeaways
Data types define what analysis you can perform.
Visualization choices depend on whether you’re comparing, tracking, or exploring relationships.
Mean, median, and mode each tell a different story about “typical” values.
Variability (standard deviation, range) matters as much as averages.
Understanding skewness, kurtosis, and correlation helps interpret risk and relationships in financial data.
Real World Analogies
Discrete data: Counting the number of tickets sold for a concert.
Continuous data: Measuring the time between two train arrivals.
Nominal data: Sorting survey responses by favorite brand.
Ordinal data: Rating hotels from 1–5 stars.
Histogram: Like sorting people by age group at a party.
Box plot: Like showing the shortest, tallest, and average heights on a team.
Skewness: A market where most returns are small but a few are extreme losses.
Correlation: Ice cream sales and temperature move together — positive correlation.
Learning Module 3: Probability Concepts
Random Variables & Events
A random variable represents uncertain numerical outcomes — it’s the number you don’t know yet, like the result of a dice roll or a coin toss. Once the outcome occurs, that value becomes the observed value.
An event is one or more outcomes that share a property. For instance, rolling an even number on a die (2, 4, or 6) is a single event.
Types of Events:
- Mutually Exclusive: Two events that cannot happen together (e.g., flipping both heads and tails in one coin toss).
- Exhaustive: A complete set of all possible outcomes (for a die: {1,2,3,4,5,6}).
Analogy:
Imagine dividing a pizza into slices — each slice represents a possible event. You can’t eat two slices at once (mutually exclusive), and all slices together make the full pizza (exhaustive).
Properties of Probability
All probabilities follow two basic rules:
- Every event’s probability lies between 0 and 1.
- The sum of probabilities across all outcomes equals 1.
Analogy:
Probability is like pie slices again — you can’t have negative slices, and all slices must fill the entire pie exactly.
Types of Probability
Empirical Probability: Based on historical data.
Example: If it rained 30 of the last 100 days, the empirical probability of rain is 0.3.
A Priori Probability: Based on logical reasoning, not observation.
Example: Rolling a 3 on a fair six-sided die = 1/6.
Subjective Probability: Based on personal judgment or intuition.
Example: An analyst believes there’s a 70% chance the market will rise.
Analogy:
Empirical = “we’ve seen it happen,”
A priori = “we know it by logic,”
Subjective = “we feel it might happen.”
Odd & Probability
Odds show likelihoods in ratio form.
Probability = \frac{a}{(a+b)}
Example: 4-to-1 odds → 4 / (4 + 1) = 0.8 = 80%.
Analogy: Odds are like betting language — “4 to 1” means four chances of winning for every one chance of losing.
Conditional vs Unconditional Probability
Unconditional Probability: The chance of an event occurring, regardless of other events.
Example: The chance of rolling a six is always 1/6, no matter previous rolls.
Conditional Probability: The probability of one event given another has occurred.
Formula: P(A | B) = Probability of A given B.
P(A \mid B) = \frac{P(AB)}{P(B)}
Example: If it’s cloudy (B), the probability of rain (A) increases.
P(A|B) = Probability of A given B
P(AB) = Probability of A & B
Analogy:
Conditional probability is like narrowing your view — once you know it’s cloudy, your forecast adjusts.
Probability Rules
It helps to picture a Venn diagram. Are you trying to find just A, just B, etc.
| Rule | Meaning / Use |
|---|---|
| Multiplication Rule (Joint Probability) | to find the probability of two or more events happening together, whether the events are independent or dependent |
| Independent Events | When A doesn’t affect B |
| Addition Rule | Either event occurs |
| Total Probability Rule | to find the probability of one event or another event occurring, including the possibility of both happening. |
Multiplication Rule (Joint Probability)
P(AB) = P(A \mid B) \times P(B)
Independent Events:
P(AB) = P(A) \times P(B)
Addition Rule:
P(A \text{ or } B) = P(A) + P(B) - P(AB)
Total Probability Rule:
P(A) = \sum_i P(A \mid B_i) \times P(B_i)
Analogy:
Multiplication is “AND,” addition is “OR.”
Tree diagrams help visualize these paths — like mapping every branch of possibilities.
Expected Value
Expected value is the probability-weighted average of all possible outcomes — your long-run average if you repeated the event infinitely.
Formula:
E(X) = \sum_i X_i P(X_i)
Example: A 50% chance of +10 and 50% chance of –10 → EV = 0.
Analogy:
If you play the same game many times, EV tells you your average win or loss per play.
Variance Using Probabilities
Variance measures how far outcomes spread from their expected value.
Formula:
Var(X) = E[(X - E(X))^2]
Compute the expected value, then average the squared deviations from it.
Analogy:
Variance is like measuring how tightly clustered or widely scattered darts are around the bullseye.
Covariance & Correlation
Covariance: Measures how two variables move together.
Var(X) = E[(X - E(X))^2]
Positive = move together, Negative = move opposite.
Correlation: Standardizes covariance between –1 and +1.
Cov(X,Y) = E[(X - E(X))(Y - E(Y))]
Analogy:
Covariance says “do they move in the same direction?”
Correlation says “how strong is that relationship?” — like grading a friendship: +1 = always agree, –1 = always disagree.
Portfolio Variance & Standard Deviations
Used to measure total risk when combining assets.
Formulas:
For 2 Assets:
\sigma_p^2 = w_A^2 \sigma_A^2 + w_B^2 \sigma_B^2 + 2w_A w_B Cov_{AB}
Using Correlation to find 2 assets:
\sigma_p^2 = w_A^2 \sigma_A^2 + w_B^2 \sigma_B^2 + 2w_A w_B \rho_{AB} \sigma_A \sigma_B
Portfolio Standard Deviation
\sigma_p = \sqrt{\sigma_p^2}
Analogy:
Like mixing two investments: if they move differently, the portfolio “smooths out” risk.
Bayes’ Theorem
Bayes’ Theorem updates probabilities when new information appears.
It combines prior probability with new evidence to calculate a revised (posterior) probability.
Conceptual Formula:
P(A \mid B) = \frac{P(B \mid A) \times P(A)}{P(B)}
Expanded form (using multiple scenarios
P(A \mid B) = \frac{P(B \mid A) P(A)}{\sum_i P(B \mid A_i) P(A_i)}
Real-World Example:
If a medical test is 95% accurate, Bayes’ Theorem helps find the probability you actually have the disease given a positive result.
Analogy:
It’s like adjusting your weather forecast when you see new clouds forming.
Counting & Probability
1. Multiplication Rule
If tasks are independent, multiply the number of ways each can occur.
N = n_1 \times n_2 \times ... \times n_k
Example: 3 shirts × 2 pants × 2 shoes = 12 outfit combinations.
2. Factorial (n!)
Total ways to arrange n items.
Formula:
n! = n \times (n - 1) \times (n - 2) \times ... \times 1
n! = n × (n – 1) × … × 1
Example: 5! = 120 ways to seat 5 people in order.
3. Labeling (Multinomial):
Arranging items into subgroups.
\frac{n!}{n_1! n_2! ... n_k!}
Example: Sorting 5 players into 2 teams of 2 and 1 substitute.
4. Combinations (nCr):
Selecting items when order doesn’t matter.
{}^nC_r = \frac{n!}{r!(n - r)!}
Example: Choosing 3 out of 10 lottery numbers.
5. Permutations (nPr):
Selecting items when order does matter.
{}^nP_r = \frac{n!}{(n - r)!}
Example: Assigning gold, silver, and bronze medals to 3 out of 10 athletes.
Analogy:
Combinations are like picking players for a team (order irrelevant).
Permutations are like ranking them on the podium (order counts).
Summary Table
| Concept | Formula / Description | Real-World Analogy |
|---|
| Random Variable | Uncertain outcome | Rolling a die |
| Mutually Exclusive | Events can’t co-occur | Heads or tails |
| Exhaustive | All outcomes covered | All six sides of a die |
| Probability Rules | 0 ≤ P ≤ 1; Sum = 1 | Whole pie |
| Empirical | Based on data | Past weather |
| A Priori | Based on logic | Fair die odds |
| Subjective | Based on opinion | Investor’s hunch |
| Multiplication Rule | P(AB)=P(A | B)×P(B) |
| Addition Rule | P(A or B)=P(A)+P(B)–P(AB) | “OR” logic |
| Total Probability | Σ [P(A | Bᵢ) × P(Bᵢ)] |
| Expected Value | Σ X P(X) | Long-term average |
| Variance | E[(X–E(X))²] | Spread of outcomes |
| Covariance | E[(X–E(X))(Y–E(Y))] | Stocks moving together |
| Correlation | Cov(X,Y)/(σxσy) | Strength of movement |
| Portfolio Variance | See formula | Combined investment risk |
| Bayes’ Theorem | Updates P with new info | Medical test probability |
| Combination | nCr | Choosing team members |
| Permutation | nPr | Ranking winners |
Key Takeaways
Probabilities quantify uncertainty and guide investment and risk analysis.
Understand the difference between unconditional and conditional events.
Use joint, addition, and total-probability rules to combine events logically.
Expected value measures average outcome; variance and correlation measure risk and relationships.
Counting rules (factorials, combinations, permutations) simplify complex probability questions.
Real-World Tip:
In finance, probability concepts underpin portfolio diversification, option pricing, and forecasting.
Always ask: Are these events independent? Does order matter? Am I updating my beliefs with new information?
Learning Module 4: Common Probability Distributions
Probability Distributions: The Foundations
A probability distribution lists all possible outcomes of a random variable and the probabilities associated with each.
There are two main types:
- Discrete random variables: Countable outcomes, like the number of heads in 5 coin tosses.
- Continuous random variables: Infinite outcomes within a range, like interest rates or stock returns.
Analogy:
Discrete is like counting marbles in a jar — you can list them.
Continuous is like measuring milk in a jug — it can always be divided more finely.
Discrete Uniform Distributions
A discrete uniform distribution assigns equal probabilities to all outcomes. Every event has the same chance of occurring.
Example: Rolling a fair six-sided die (each side = 1/6).
Analogy:
Like a perfectly balanced spinner — each color slice is the same size, so no outcome is more likely than another.
Cumulative Distribution Function (CDF)
The CDF gives the probability that a random variable is less than or equal to a certain value.
It builds from 0 to 1 as you move through the distribution.
Example: If 70% of students scored 80 or less, then F(80) = 0.7.
Analogy:
Think of it as a “running total” of probability — like filling a glass of water until it’s full (100%).
Continuous Uniform Distribution
This distribution covers all values within a continuous range between A (minimum) and B (maximum).
The probability of an exact single value is 0%, but ranges have measurable probability.
Formula:
P(x_1 \leq X \leq x_2) = \frac{x_2 - x_1}{b - a}
Analogy:
Imagine a dartboard where any hit within the circle counts — no single exact point has weight, only areas matter.
Bernoulli Trials
Examples include flipping a coin, default/no default, or pass/fail on an exam.
Each trial is independent, and the probability of success is constant.
Analogy:
Like a light switch — it’s either on or off, with no in-between.
Binomial Distribution
The binomial distribution gives the probability of getting a specific number of successes (x) in a fixed number of independent Bernoulli trials (n).
Formula:
P(X = x) = \binom{n}{x} p^x (1 - p)^{n - x}
Expected Value:
E(X) = np
Variance:
Var(X) = np(1 - p)
Example:
What’s the probability of flipping 3 heads in 5 fair coin tosses?
P(3)=(53)(0.5)3(0.5)2=10(0.125)(0.25)=0.3125P(3)=(35)(0.5)3(0.5)2=10(0.125)(0.25)=0.3125
Analogy:
Like counting how many baskets a player makes in 10 free throws — each shot is a Bernoulli trial.
Normal Distribution
The normal distribution is the bell-shaped curve that describes many real-world data sets — symmetric around the mean.
Key Properties:
- Mean = Median = Mode
- Skewness = 0 (perfect symmetry)
- Kurtosis = 3
- 68% of values fall within 1σ
- 95% within 2σ
- 99% within 3σ
Analogy:
Imagine test scores in a large class — most students cluster around the average, with fewer outliers at both ends.
Univariate vs Multivariate Distributions
Univariate: Describes one variable (e.g., one stock’s returns).
Multivariate: Describes multiple correlated variables (e.g., returns of two stocks).
The number of pairwise correlations in n variables = n(n−1)22n(n−1).
Analogy:
One variable = a single track on a chart.
Multivariate = multiple tracks moving together or apart — like dancing partners.
Confidence Intervals (Normal Distribution)
A confidence interval shows the range where a population parameter is likely to lie, given a certain probability.
| Confidence Level | Z-Value | Coverage |
|---|---|---|
| 90% | 1.65σ | ±1.65 standard deviations |
| 95% | 1.96σ | ±1.96 standard deviations |
| 99% | 2.58σ | ±2.58 standard deviations |
Analogy:
It’s like saying, “I’m 95% confident the dart will land within this ring around the bullseye.”
Z-Tables
The Z-table provides the probability (area) to the left of a given Z-value under the standard normal curve.
If Z=1.65Z=1.65, then 95% of the area lies to its left.
Tip: The Z-table only applies to normal distributions.
Analogy:
It’s a probability map — Z tells you how far from the mean you are, and the table tells you how much of the population lies below that point.
Roy’s Safety First Criterion
This helps investors choose portfolios that minimize the chance of returns falling below a minimum acceptable threshold (Rᴸ).
Formula:
SFR = \frac{E(R_p) - R_L}{\sigma_p}
The higher the ratio, the safer the portfolio.
Analogy:
Like choosing the parachute that gives you the highest chance of landing safely above the danger zone.
Lognormal Distribution
A lognormal distribution arises when the natural logarithm of a variable is normally distributed.
It’s positively skewed — meaning it has a long right tail.
Used to model stock prices, since they can’t go below zero but can rise infinitely.
Formula:
If ln(X)∼N(μ,σ2), then X follows a lognormal distribution.
\ln(X) \sim N(\mu, \sigma^2)
Analogy:
Stock prices act like trees — they can grow tall (positive skew) but can’t sink below the ground (zero).
Continuously Compounded Returns
Used in finance for modeling constant growth.
Formula:
R_{cc} = \ln\left(\frac{P_t}{P_0}\right)
or
P_t = P_0 e^{R_{cc}}
Analogy:
Think of your return compounding smoothly every second instead of once per year — like continuous water flow versus monthly drips.
T-Distribution
The t-distribution is similar to the normal curve but has thicker tails and a lower peak — accounting for small sample uncertainty.
As degrees of freedom (df = n – 1) increase, it approaches the normal distribution.
| Confidence Level | t (df=29) |
|---|---|
| 90% | 1.699 |
| 95% | 2.045 |
| 99% | 2.756 |
Analogy:
When you have fewer data points, you give more room for error — the t-curve spreads out wider, like estimating an average from only a few test scores.
Chi-Square (χ²) Distribution
The chi-square distribution represents the sum of squared standard normal variables.
Used to test variance and goodness-of-fit.
It only takes positive values, since squared numbers can’t be negative.
As degrees of freedom rise, it becomes more symmetric and bell-shaped.
Analogy:
Like squaring every person’s deviation from average height — the negatives disappear, leaving only total variation.
F-Distribution
Used to compare two variances (e.g., volatility of two portfolios).
Defined by two degrees of freedom:
Denominator (df₂)
As both increase, the F-curve looks more like a normal distribution.
Analogy:
Like comparing two pitchers’ accuracy — you’re testing whether their throws vary equally.
Numerator (df₁)
Monte Carlo Simulation
A Monte Carlo simulation uses random sampling to model complex systems and estimate probabilities.
Common applications include:
- Estimating risk and return of portfolios
- Valuing complex securities (options, derivatives)
- Running sensitivity and scenario analyses
Limitations:
- Provides statistical estimates, not exact results
- Doesn’t explain why outcomes occur
Analogy:
Like running a video game thousands of times to see how often you win — you can estimate your odds but not predict every move exactly.
Summary Table
| Concept | Formula / Property | Real-World Analogy |
|---|---|---|
| Discrete Variable | Countable outcomes | Rolling dice |
| Continuous Variable | Infinite outcomes | Measuring time |
| Uniform (Discrete) | Equal probabilities | Fair spinner |
| Bernoulli | 2 outcomes | On/off switch |
| Binomial | (nx)px(1−p)n−x(xn)px(1−p)n−x | Shots made in basketball |
| Normal | Symmetrical, bell curve | Test scores |
| Lognormal | Positively skewed | Stock prices |
| Z-Score | Z=x−μσZ=σx−μ | Distance from mean |
| Roy’s Criterion | SFR=E(Rp)−RLσpSFR=σpE(Rp)−RL | Choosing safest parachute |
| T-Distribution | Small sample version of normal | Fewer data = wider curve |
| Chi-Square | Sum of squares | Measuring variance |
| F-Distribution | Ratio of variances | Comparing volatility |
| Monte Carlo | Repeated simulation | Playing out scenarios |
Key Takeaways
Probability distributions model uncertainty — discrete for countable, continuous for measurable.
Normal and lognormal distributions form the backbone of return modeling.
Confidence intervals show precision around estimates.
T, Chi-square, and F distributions handle smaller samples or variance testing.
Monte Carlo simulation helps test “what-if” scenarios without analytical formulas.
In finance: these distributions help model returns, risk, and statistical inference — the bridge from probability theory to investment decision-making.
Learning Module 5: Sampling and Estimation
Population Parameters vs. Sample Statistics
A parameter describes an entire population (like the true mean μ or variance σ²), while a sample statistic summarizes a smaller sample (like sample mean x̄ or standard deviation s).
Since we rarely observe entire populations, we use statistics to estimate parameters.
Real Example:
The true average daily spending across all CIBC clients is a population mean (μ). The average from 1,000 sampled clients is a sample mean (x̄).
Analogy:
The population is the entire ocean; your sample is a bucket of seawater. Measuring salt in your bucket helps you estimate the ocean’s salinity.
Probability Sampling Methods
In probability sampling, every member has a known, nonzero chance of selection. This reduces bias and supports valid statistical inference.
Types:
Cluster Sample: Divide into mini-populations (clusters) and sample clusters instead of individuals.
Example: Choose several bank branches (clusters) and collect all or some customer responses.
Analogy: Testing a few classrooms to infer school-wide performance.
Simple Random Sample (SRS): Every element has an equal chance.
Example: Randomly choosing 1,000 clients using a random number generator.
Analogy: Drawing names from a hat.
Systematic Sample: Select every nᵗʰ element after a random start.
Example: Selecting every 50th transaction after a random start.
Pitfall: Hidden patterns (e.g., daily cycles) can bias results.
Analogy: Checking every 10th product on a conveyor belt.
Stratified Sample: Divide the population into subgroups (strata) and sample proportionally within each.
Example: Segment clients by region or wealth tier and sample from each.
Analogy: Picking fruit from all parts of a tree to represent different sunlight exposures.
Non-Probability Sampling
Selection depends on convenience or judgment, not randomization. It’s faster but more prone to bias.
Judgment Sample: Expert-selected sample believed to be representative.
Example: A risk analyst handpicking key corporate clients.
Analogy: A chef tasting a “typical” spoonful of soup — may not capture full variability.
Convenience Sample: Select what’s easiest to access.
Example: Using only clients who respond to a mobile survey.
Analogy: Asking whoever is nearby for directions.
Sampling Error
Definition: The difference between a sample statistic and the population parameter it estimates.
Larger, well-designed samples reduce error but never eliminate it.
Example: Sample mean ATM withdrawal = $120; true population mean = $125 → Sampling error = $5.
Analogy: Measuring the room’s temperature to estimate the whole building’s — close, but not exact.
Central Limit Theorem (CLT)
The CLT is the cornerstone of inference: regardless of population shape, the sampling distribution of the sample mean (x̄) becomes approximately normal when sample size (n) is large.
Formulas:Var(Xˉ)=σ2n,SEXˉ=σn
\mathrm{Var}(\bar{X}) = \frac{\sigma^2}{n}, \quad SE_{\bar{X}} = \frac{\sigma}{\sqrt{n}} \
Example: Averaging 30 days of returns gives a nearly normal distribution of average returns even if daily data is skewed.
Analogy: Individual raindrops fall randomly, but the average rainfall per week forms a smooth pattern.
Properties of a Good Estimator
A good estimator has three key traits:
| Property | Description | Analogy |
|---|---|---|
| Unbiased | Its expected value equals the true parameter. | The arrows, on average, hit the bullseye. |
| Efficient | Has the smallest variance among unbiased estimators. | The arrows cluster tightly around the center. |
| Consistent | Improves as n increases (standard error ↓). | More practice = tighter grouping around the bullseye. |
Example: The sample mean x̄ is an unbiased, consistent, and efficient estimator of μ for large n.
Estimator vs. Point Estimate
An estimator is the formula used to calculate a sample statistic; a point estimate is the resulting single value.
Analogy: The estimator is the recipe; the point estimate is the cookie you bak
Confidence Intervals
A confidence interval (CI) gives a range where the population mean likely lies, based on a sample.
Wider intervals → more confidence, less precision.
Formula (σ known):Xˉ±Zα/2(σn)Xˉ±Zα/2(nσ)
\bar{X} \pm Z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}} \right)
Example: Average mortgage processing time = 5 days, σ = 1.2, n = 64.
At 95% confidence
(Z = 1.96):5± 1.96×1.28=5±0.294⇒[4.706,5.294]
Analogy: Guardrails around your best guess — wider rails mean more safety but less precision.
Resampling Methods
Resampling generates new samples from existing data to estimate variability when formulas aren’t practical.
Jackknife Method:
- Remove one observation at a time (n repetitions).
- Reduces bias and estimates standard error.
- Produces stable results.
Analogy: Removing one plank at a time from a bridge to test its sturdiness.
Bootstrap Method:
Different each run; used to estimate standard errors and CIs.
Analogy: Making many mini-batches of cookies by reusing ingredients — some batches repeat the same scoops.
Randomly resample with replacement from the original data.
Common Biases in Empirical Analysis
| Bias | Description | Example | Mitigation |
|---|---|---|---|
| Sample Selection | Certain groups excluded | Only branch clients surveyed | Use probability sampling |
| Data Snooping | Searching until “something” fits | Testing hundreds of factors until one works | Predefine hypotheses, use out-of-sample tests |
| Survivor Bias | Excluding failed entities | Studying only surviving funds | Include inactive entities |
| Look-Ahead Bias | Using future data in past tests | Backtests using revised data | Restrict to information available at the time |
| Time-Period Bias | Analysis window not representative | Strategy tested only in bull markets | Use multiple timeframes |
Analogy: Building a map from future roads or skipping closed streets — your navigation looks accurate but misleads.
Choosing Between z and t Tests
| Distribution of Data | σ Known? | n < 30 | n ≥ 30 | Use |
|---|---|---|---|---|
| Normal | Yes | z | z | z |
| Normal | No | t | t* | t (≈ z for large n) |
| Non-Normal | Yes | n/a | z | z (via CLT if large n) |
| Non-Normal | No | n/a | t* | t (via CLT if large n) |
Analogy:
Choosing z or t is like picking the right wrench:
Larger projects (large n) give you flexibility.
z for when you know the bolt size (σ known),
t for when you estimate it.
Summary Table
| Concept | Formula / Description | Analogy |
|---|---|---|
| Sampling Error | ( \bar{X} – \mu ) | Measuring one room’s temperature to estimate the whole building |
| CLT | ( Var(\bar{X}) = \frac{\sigma^2}{n} ) | Averaging chaos into order |
| Confidence Interval | ( \bar{X} \pm Z_{\alpha/2}\frac{\sigma}{\sqrt{n}} ) | Guardrails around a guess |
| Jackknife | Leave-one-out resampling method | Remove one plank to test bridge strength |
| Bootstrap | Resample with replacement to estimate sampling distribution | Reusing ingredient scoops |
| Good Estimator | Unbiased, Efficient, Consistent | Tight arrow grouping on target |
| Bias Types | Sampling, Snooping, Survivor, Look-ahead, Time | Wrong map, wrong road |
Key Takeaways
Choose z vs. t based on normality, sample size, and whether σ is known.
Probability samples yield more representative results; non-probability samples are quicker but risk bias.
The Central Limit Theorem ensures that sample means approximate normality for large n.
Good estimators are unbiased, efficient, and consistent.
Confidence intervals quantify uncertainty.
Jackknife and bootstrap methods are robust alternatives to analytical formulas.
Bias awareness is critical for valid empirical work.
Learning Module 6: Hypothesis Testing
What is Hypothesis Testing
Hypothesis testing is a structured way to make decisions about a population parameter using sample data.
You start with two competing claims:
- Null hypothesis (H₀): The “status quo” — what we assume true until proven otherwise.
- Alternative hypothesis (Hₐ): What the analyst believes or wants to prove.
Example:
You believe a portfolio’s mean return is greater than 8%.
H0:μ=8% vs. Ha:μ>8%
Analogy:
Like a courtroom: H₀ is “innocent until proven guilty.” You need enough evidence (data) to reject it.
Steps in Hypothesis Testing
| Step | Description | Example / Analogy |
|---|---|---|
| 1. State Hypotheses | Define H₀ and Hₐ. H₀ always includes equality (=, ≤, or ≥). | “The coin is fair” (H₀) vs. “The coin is biased” (Hₐ). |
| 2. Choose Test Type | Decide 1-tailed or 2-tailed. | Testing for equality → 2-tailed; Testing for > or < → 1-tailed. |
| 3. Determine Significance (α) | α = probability of Type I error. Common levels: 0.01, 0.05, 0.10. | “How strict is the judge?” |
| 4. Calculate Test Statistic | Compare sample data to hypothesized parameter. | Compute z, t, χ², or F. |
| 5. Determine Critical Value or p-value | Defines the rejection region. | If p < α, reject H₀. |
| 6. Make Decision | Reject or fail to reject H₀. | Convict or acquit the defendant. |
Errors in Hypothesis Testing
| Error | Definition | Analogy |
|---|
| Type I Error (α) | Rejecting a true H₀ | Throwing away a good apple |
| Type II Error (β) | Failing to reject a false H₀ | Eating a bad apple |
| Power of a Test | 1−β: Probability of correctly rejecting a false H₀ | Detecting the bad apple correctly |
Test Statistic Formula (for mean)
Z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}}
If σ unknown → use t-statistic:
t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}}
Interpretation:
If your test statistic lands in the rejection zone, you have enough evidence to reject H₀.
P-Value Concept
The p-value is the smallest significance level (α) at which you can reject H₀.
If p < α, reject H₀.
If p > α, fail to reject H₀.
Analogy:
Think of α as the “bar for evidence.”
If your p-value is lower than that bar, your evidence is strong enough to convict (reject H₀).
One tailed vs Two Tailed
| Type | Hypotheses | Rejection Region | Example |
|---|---|---|---|
| Two-Tailed | H0:μ=10 Ha:μ≠10 | Both ends of distribution | Testing if returns differ from 10% |
| Right-Tailed | H0:μ≤10 Ha:μ>10 | Upper tail | Testing if returns are greater than 10% |
| Left-Tailed | H0:μ≥10 Ha:μ<10 | Lower tail | Testing if returns are below 10% |
Statistical vs Economical Significance
Even if a result is statistically significant, it may not be economically meaningful.
A small difference might be real but too tiny to matter financially.
Example:
A 0.01% increase in portfolio return may be “significant,” but transaction costs may erase the gain.
Multiple Testing Problem
When testing many hypotheses, the probability of making at least one Type I error increases.
Procedure (Bonferroni correction):
- Rank p-values from smallest to largest.
- Compute adjusted α = α(rank / total tests).
- Compare p-values ≤ adjusted α → significant.
Analogy:
The more dart throws you take, the more likely you’ll hit the bullseye by accident.
Key Tests Summary
| Test | What It Tests | Requirements | Statistic |
|---|---|---|---|
| Z-Test | Population mean (σ known) | Normal population | \( Z = \frac{\bar{X} – \mu_0}{\sigma / \sqrt{n}} \) |
| T-Test | Population mean (σ unknown) | Normal or large n | \( t = \frac{\bar{X} – \mu_0}{s / \sqrt{n}} \) |
| Chi-Square Test | Population variance | Normal data | \( \chi^2 = \frac{(n-1)s^2}{\sigma_0^2} \) |
| F-Test | Equality of two variances | Normal, independent | \( F = \frac{s_1^2}{s_2^2} \) |
Difference in Means Tests
Independent Samples (Equal Variances):
t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}
where sp=:
s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}
Independent Samples (Unequal Variances):
t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}
Paired Comparisons (Dependent Samples):
t = \frac{\bar{d} - 0}{s_d / \sqrt{n}}
dˉ=mean of differences,
sd=standard deviation of differences.
Analogy:
Independent = comparing two teams’ averages.
Paired = comparing the same team’s “before and after” performance.
Tests for Correlation
Pearson Correlation Test (parametric):
Tests if the population correlation coefficient (ρ) differs from 0.
t = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^2}}
Spearman Rank Correlation (non-parametric):
Used when data aren’t normally distributed or contain outliers.
It tests whether rankings between two variables are correlated.
Analogy:
Pearson: comparing precise scores.
Spearman: comparing the order of finish (ranks).
Chi Square Tests for Independence
Used in contingency tables to test if two categorical variables are related.
\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}
where:
E_{ij} = \frac{(\text{row total})(\text{column total})}{\text{grand total}}
Example:
Is fund style (Value, Growth, Blend) independent of fund size (Small, Medium, Large)?
Analogy:
Checking if ice cream flavor preference depends on age group — or if they’re independent.
Power of a Test
\text{Power of a Test} = 1 - \beta
Higher power means better detection of real effects — usually achieved with larger sample size or higher α.
Non-Parametric Test
Used when data don’t meet assumptions of normality or when variables are in ranks rather than raw values.
Examples:
- Spearman Rank Correlation (relationship in ranks)
- Runs Test (randomness)
- Wilcoxon Signed-Rank Test (median comparisons)
- Chi-Square Independence Test (categorical relationships)
Analogy:
When numbers are messy or skewed, switch to rank-based tests — less precise but more robust.
Summary Table
| Concept | Formula |
|---|---|
| Z-test (mean) | \( Z = \frac{\bar{X} – \mu_0}{\sigma / \sqrt{n}} \) |
| t-test (mean) | \( t = \frac{\bar{X} – \mu_0}{s / \sqrt{n}} \) |
| Chi-square | \( \chi^2 = \frac{(n – 1)s^2}{\sigma_0^2} \) |
| F-test | \( F = \frac{s_1^2}{s_2^2} \) |
| Correlation test | \( t = \frac{r\sqrt{n – 2}}{\sqrt{1 – r^2}} \) |
| Power | \( 1 – \beta \) |
| Confidence interval | \( \bar{X} \pm Z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \) |
| Spearman | \( r_s = 1 – \frac{6\sum d_i^2}{n(n^2 – 1)} \) |
Key Takeaways
Hypothesis testing = data-based decision-making framework.
Always define H₀ and Hₐ clearly — H₀ includes equality.
Errors are inevitable: minimize α and β trade-offs.
p-value tells you if results are significant; smaller = stronger evidence.
Choose tests based on data type, distribution, and sample size.
Statistical significance ≠ practical relevance.
Non-parametric tests are your backup when assumptions break.
Learning Module 7: Into to Linear Regression
What is Linear Regression
Linear regression explains how one variable (dependent, Y) changes in response to another (independent, X).
It draws a “best fit” line through data points to predict Y based on X.
Simple Linear Regression: one X variable.
Y_i = b_0 + b_1 X_i + \varepsilon_i
- Yi: actual observed value
\varepsilon_i = Y_i - \hat{Y_i} = residual (error)
Predicted Value on Regression Line:
\hat{Y_i} = b_0 + b_1 X_i
Goal: minimize squared residuals (Least Squares Method).
Analogy:
Think of trying to balance a tightrope perfectly through scattered data points — the line should stay as close as possible to every point without tilting too much.
Least Squares Criterion
Regression chooses b0 and b1 to minimize the sum of squared errors:
SSE = \sum_{i=1}^{n} (Y_i - \hat{Y_i})^2
This ensures the line captures the general trend while limiting large deviations.
Regression Assumptions
| Assumption | Description | Analogy |
|---|---|---|
| 1. Linearity | Relationship between X and Y must be linear. Residuals should appear random. | Drawing a straight path through scattered dots. |
| 2. Homoscedasticity | Variance of residuals is constant across X values. | Raindrops evenly spread on a windshield. |
| 3. Independence | Observations and residuals are independent (no autocorrelation). | Each data point acts alone — no “copying neighbors.” |
| 4. Normality | Residuals are normally distributed. | The errors form a nice bell curve centered around zero. |
Violating these leads to biased or inefficient results.
Decomposing Variance
Regression breaks total variation in Y into explained and unexplained parts:S
SST=SSR+SSE
| Component | Meaning | Formula |
|---|---|---|
| SST | Total Sum of Squares | \( SST = \sum (Y_i – \bar{Y})^2 \) |
| SSR | Regression Sum of Squares (explained) | \( SSR = \sum (\hat{Y_i} – \bar{Y})^2 \) |
| SSE | Error Sum of Squares (unexplained) | \( SSE = \sum (Y_i – \hat{Y_i})^2 \) |
Analogy:
Think of SST as total “scatter” of your darts around the bullseye. Regression (SSR) explains part of it; SSE is your leftover miss.
Goodness of Fit – R² and F-Statistic
Coefficient of Determination (R²):
R2=SSRSSTR2=SSTSSR
R^2 = \frac{SSR}{SST}
- Measures how much variation in Y is explained by X.
- Ranges from 0 → 1 (higher = better).
Example: R²=0.90 → 90% of variation in Y explained by the model.
Analogy:
R² is like how much of a movie plot you can explain with a single character’s actions — higher R² means that character (X) drives more of the story (Y).
F-Statistic (Model Significance):
Tests if the model explains a significant portion of Y’s variation.
F=MSRMSE=(SSR/k)(SSE/(n−(k+1)))F=MSEMSR=(SSE/(n−(k+1)))(SSR/k)
F = \frac{MSR}{MSE} = \frac{(SSR / k)}{(SSE / (n - (k + 1)))}
- k: number of independent variables
- n: number of observations
A high F means the model is statistically useful.
Analogy:
Like testing if your overall recipe actually tastes better than random mixing.
ANOVA Table – Regression Breakdown
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Regression | SSR | k | MSR = SSR/k | MSR/MSE |
| Error | SSE | n-(k+1) | MSE = SSE/(n-(k+1)) | — |
| Total | SST | n-1 | — | — |
Interpretation:
The F-statistic in ANOVA tells you whether the regression model significantly improves prediction compared to using just the mean.
Standard Error of Estimate
Measures average distance between actual and predicted Y values:
SEE = \sqrt{\frac{SSE}{n - 2}}
Lower SEE = better model fit (closer predictions).
Analogy:
Like measuring how much your aim misses the bullseye — smaller SEE = more accurate throws.
Testing the Slope
To test if the relationship between X and Y is statistically significant:
t = \frac{b_1 - \beta_1}{S_{b_1}}
- b1: estimated slope
- β1: hypothesized slope (usually 0)
- Sb1: standard error of slope
- df = n − 2
where:
S_{b_1} = \frac{SEE}{\sqrt{\sum (X_i - \bar{X})^2}}
If |t| > t₍critical₎ → Reject H₀ (slope ≠ 0).
Analogy:
If the line’s slope is meaningfully tilted (not flat), X actually predicts Y.
Testing the Intercept (b0)
t=Sb0b0−β0
t = \frac{b_0 - \beta_0}{S_{b_0}}
with:
S_{b_0} = SEE \sqrt{\frac{1}{n} + \frac{\bar{X}^2}{\sum (X_i - \bar{X})^2}}
Usually less important unless the intercept itself has theoretical meaning (e.g., expected Y when X = 0).
Pearson Correlation Coefficient
Measures the strength and direction of linear relationship between X and Y:
r = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}
Can also be tested with:
t = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^2}}
df = n − 2
Interpretation:
- r = +1 → perfect positive linear relationship
- r = 0 → no linear relationship
- r = −1 → perfect negative linear relationship
Analogy:
If X and Y dance in sync, r ≈ 1; if one moves left while the other moves right, r ≈ −1.
Predicting Values of Y
Predicted value:
\hat{Y_i} = b_0 + b_1 X_i
Confidence Interval for Prediction:
\hat{Y} \pm t_{critical} \times S_f
Where:
S_f = SEE \sqrt{1 + \frac{1}{n} + \frac{(X_0 - \bar{X})^2}{\sum (X_i - \bar{X})^2}}
Sf is the standard error of forecast.
Actual Y vales will likely fall within this interval.
Analogy:
The regression line gives the “best guess,” but the confidence band is your safety net around it.
Functional (Non-Linear) Forms
If the X–Y relationship is not linear, transform variables into a linearizable form.
| Model | Formula | Interpretation |
|---|---|---|
| Log-Lin | \( \ln(Y) = b_0 + b_1 X \) | 1-unit change in X → % change in Y |
| Lin-Log | \( Y = b_0 + b_1 \ln(X) \) | 1% change in X → \( \frac{b_1}{100} \) change in Y |
| Log-Log | \( \ln(Y) = b_0 + b_1 \ln(X) \) | % change in X → % change in Y (elasticity = \( b_1 \)) |
Analogy:
You’re reshaping the data until it fits a straight line — like adjusting camera angles to see a straight horizon.
Evaluating the Model
A good regression model has:
✅ High R²
✅ High F-statistic
✅ Low SEE
✅ Uncorrelated residuals
✅ Normally distributed residuals
Analogy:
A “good fit” is like a well-tailored suit — it follows the shape closely, with minimal wrinkles (errors).
Summary Table
| Concept | Formula | Interpretation |
|---|---|---|
| Regression Equation | \( Y_i = b_0 + b_1 X_i + \varepsilon_i \) | Line of best fit |
| Total Variation | \( SST = SSR + SSE \) | Total = explained + unexplained |
| R² | \( R^2 = \frac{SSR}{SST} \) | % of variation explained |
| F-statistic | \( F = \frac{MSR}{MSE} \) | Overall model significance |
| SEE | \( SEE = \sqrt{MSE} \) | Prediction accuracy |
| t-stat (slope) | \( t = \frac{b_1 – \beta_1}{S_{b_1}} \) | Tests slope significance |
| Pearson r | \( r = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y} \) | Linear relationship strength |
| Prediction Interval | \( \hat{Y} \pm t_{critical} S_f \) | Range of likely Y values |
Key Takeaways
Regression quantifies and predicts relationships between variables.
Assumptions (linearity, independence, constant variance, normality) must hold for valid inference.
R² shows explanatory power; F-tests show model relevance.
t-tests validate whether slope(s) matter.
Always check residuals — they tell you if the model is honest.
Nonlinear relationships can often be made linear through log transformations.
