CFA Level 1 Quant Methods | Free Notes & Examples

Quantitative Methods – L1 Prep

Learning Module 1: Time Value of Money

Time Value of Money

Time Value of Money (TVM)

The time value of money (TVM) is one of the most fundamental concepts in finance. It reflects the principle that money today is worth more than the same amount in the future because it can be invested to earn a return. TVM explains why we charge interest when lending, why we discount future cash flows when valuing investments, and why timing matters in every financial decision.

Analogy: Think of money like seeds. If you plant them today, they grow; if you hold them in your hand, they stay the same.

PV = \frac{FV}{(1 + r)^n}

where (r) is the required rate of return and (n) is the number of compounding periods.

Future Value

Future Value measures how much an amount of money today will grow to after earning interest for a certain number of periods. It’s the foundation of understanding compounding — earning “interest on interest.” The more time or the higher the interest rate, the larger your future value will be.
Formula: FV = PV × (1 + r)ⁿ
Example: $1,000 invested at 5% for 3 years → FV = 1,000 × 1.05³ = $1,157.63
Analogy: Imagine rolling a snowball downhill — each turn picks up more snow, just like each compounding period adds more interest.

FV = PV \times (1 + r)^n \quad

where (r) is the required rate of return and (n) is the number of compounding periods.

Present Value

Present Value is the reverse of future value. It tells us how much a future sum of money is worth today. This helps investors compare opportunities that pay out at different times. PV “discounts” future cash flows using the required rate of return to bring them back to today’s dollars.

PV = \frac{FV}{(1 + r)^n}

Example: The present value of $1,157.63 in 3 years at 5% is $1,000.
Analogy: PV is like rewinding a movie — you’re taking future dollars and bringing them back to where the story starts.

Required Return / Discount Rate (r)

The required return (or discount rate) represents the opportunity cost of capital — the return an investor demands to be compensated for risk and time. It’s composed of the risk-free rate plus several risk premiums for default res, liquidity risk, maturity risk and more. In general, the higher risk of an investment, the higher the return investors demand.
Breakdown:

Required Return = Nominal Risk-Free Rate + Default + Liquidity + Maturity premiums
Example: If the real risk-free rate is 2%, inflation 3%, and total premiums 2%, required return = 7%.
Analogy: Lending money to a friend — if they’re trustworthy, you charge less; if you’re unsure, you demand more return.

Nominal Risk-Free Rate = Real Risk-Free Rate + Expected Inflation

Interest Rates & Compounding

Interest can be simple—calculated only on the principal—or compounded, which means interest is earned on both the principal and previously accrued interest. Because most real-world applications involve compounding, it is critical to match the rate and the number of periods to the compounding frequency (e.g., APR ÷ 4 for quarterly compounding). Timelines are a helpful visual tool to avoid errors, and cash outflows should always be entered as negative numbers when using a financial calculator.

Compound interest grows money faster because each period’s earnings start earning returns themselves.
Example:

Simple: $1,000 × (1 + 0.05 × 3) = $1,150
Compound: $1,000 × (1.05)³ = $1,157.63

Analogy: Simple interest is like a flat salary. Compound interest is like getting a raise each year that builds on your new, higher salary.

Compounding Frequency

The frequency of compounding (annually, quarterly, monthly) affects how quickly your money grows. Always match your inputs — the rate and the number of periods — to the compounding frequency.
Example: An APR of 8% compounded quarterly means a 2% rate per quarter (8% ÷ 4).
Calculator Tips:

Cash outflows (investments) should be entered as negatives.
“Error 5” means you forgot to make an outflow negative.

Analogy: Compounding more often is like watering a plant more frequently — it grows faster (if that’s how that works).

Annual Percentage Rate / Stated Rate vs Effective Rate

The stated annual rate (APR or nominal rate) does not reflect compounding, while the effective annual rate (EAR or EFF) accounts for it. The APR (nominal rate) shows the simple yearly interest rate, while the Effective Annual Rate (EAR) captures the true annual return including compounding. The more frequent the compounding, the higher the EAR.
The EAR is calculated as:

[ EAR = \left(1 + \frac{\text{APR}}{n}\right)^n - 1 ]

where (n) is the number of compounding periods per year. As compounding frequency increases, the EAR rises and approaches a limit, while the APR stays the same.

Example: 12% APR compounded monthly → EAR = (1 + 0.12/12)¹² − 1 = 12.68%
Analogy: APR is the advertised price, but EAR is what you actually pay once all the hidden “compounding extras” are included.

Annuities & Perpetuities

This is a placeholder tab content. It is important to have the necessary information in the block, but at this stage, it is just a placeholder to help you visualise how the content is displayed. Feel free to edit this with your actual content.

Ordinary Annuity:

An ordinary annuity is a series of equal payments made at regular intervals, where each payment occurs at the end of the period. Examples include bond coupon payments or rent paid after living in an apartment for a month.
Calculator Tip: The present value is calculated as of one period before the first payment.
Analogy: Paying Rent at the end of the month

Annuity Due:

An annuity due is similar to an ordinary annuity, except payments occur at the beginning of each period. This timing difference makes each payment effectively grow for one additional period, so annuity due values are slightly higher.
Formula Adjustment: PV (Annuity Due) = PV (Ordinary Annuity) × (1 + r)
Calculator Tip: Use the “BEG” mode when dealing with annuity due, and switch it back afterward.
Analogy: Paying rent at move-in day instead of after your first month

Ordinary Annuity vs Annuity Due:

In an ordinary annuity, payments happen at the end of the period — like paying for a service you’ve already received. In an annuity due, payments happen at the beginning — like prepaying for a gym membership before you start using it. The difference may seem small, but it matters in valuation.

Perpetuity:

[ PV = \frac{PMT}{r} ]

A perpetuity is an infinite stream of equal cash flows that never ends. Since the payments continue forever, the formula is simple:
Example: A preferred share paying $5 annually with a 5% required return → PV = 5 / 0.05 = $100.
Analogy: A perpetuity is like a magical ATM that keeps dispensing the same amount every year, forever.

Uneven Cash Flows

When cash flows are not equal, you can’t use the standard TVM buttons on your calculator. Instead, each payment must be discounted individually and then summed.
Formula: PV = CF₁/(1 + r)¹ + CF₂/(1 + r)² + …

PV = \frac{CF_1}{(1 + r)^1} + \frac{CF_2}{(1 + r)^2} + \cdots + \frac{CF_n}{(1 + r)^n}

Example: If you receive $100 in year 1, $200 in year 2, and $300 in year 3 at 5%, PV = 100/1.05 + 200/1.05² + 300/1.05³ = $545.48.
Analogy: Imagine getting different paychecks every year — to know what that’s worth today, you evaluate each separately.

Summary Table

Concept	Key Formula / Idea
Future Value	FV = PV(1 + r)ⁿ
Present Value	PV = FV / (1 + r)ⁿ
Required Return	Risk-free + premiums
Simple Interest	Interest only on principal
Compound Interest	Interest on interest
Compounding	Match rate & periods
EAR	(1 + APR/n)ⁿ − 1
Ordinary Annuity	Payments at end
Annuity Due	Payments at beginning
Perpetuity	PV = PMT / r
Uneven CFs	Discount each separately

Key Takeaways

The Time Value of Money is the foundation of all valuation in finance.
Compounding grows value forward, while discounting brings it back.
The frequency of compounding affects the true return (EAR).
Understand the timing differences between ordinary annuities, annuities due, and perpetuities.
Always align your rate, period, and sign conventions in calculations.

Common Mistakes & Calculator Tips

Forgetting to make cash outflows negative (causes Error 5)
Leaving calculator in “BEG” mode after annuity due problems
Mixing compounding periods (e.g., using annual rate with monthly periods)
Using TVM buttons for uneven cash flows
Forgetting to “hop one period forward” for annuity due

Mastering this Module

Practice by plugging real-life situations into these formulas: saving for a trip, paying off a loan, or calculating returns on an investment.

Draw timelines to visualize cash flows. Always ask, “Am I moving money forward in time or back?”

The more you relate these ideas to everyday money decisions, the faster they’ll click.

Worked Examples

1. FV/PV Example (Real World)

You invest $10,000 today in a bond that pays 5% annually, compounded once per year, for three years:

[ FV = 10{,}000 \times (1 + 0.05)^3 = 10{,}000 \times 1.1576 = 11{,}576 ]

Your money grows to $11,576 after three years. Conversely, if you are promised $11,576 three years from now and your required rate of return is 5%, the present value is $10,000 today.

2. Effective Annual Rate (EAR)

Suppose a bank advertises a 12% APR compounded monthly. The effective annual rate is:

[ EAR = \left(1 + \frac{0.12}{12}\right)^{12} - 1 = (1.01)^{12} - 1 = 0.1268 = 12.68% ]

This shows that monthly compounding increases the actual annual return from 12% to 12.68%.

3. Ordinary Annuity Example

You will receive $1,000 at the end of each year for five years. If the discount rate is 6%, the present value is:

[ PV = 1{,}000 \times \left[\frac{1 - (1 + 0.06)^{-5}}{0.06}\right] = 1{,}000 \times 4.21236 = 4{,}212.36 ]

If this were an annuity due (payments start immediately), multiply by ( (1 + 0.06) ):

[ PV_{AD} = 4{,}212.36 \times 1.06 = 4{,}464. + ]

4. Perpetuity Example

A preferred share pays a fixed dividend of $3 per year, and the required return is 8%. The value of the preferred share is:

[ PV = \frac{PMT}{r} = \frac{3}{0.08} = 37.50 ]

Learning Module 2: Organizing Visualizing and Describing Data

Understanding Types of Data – Ordinal vs Nominal

Data comes in two broad forms: numerical and categorical.
Numerical data consists of measurable or countable values. It’s split into:

Discrete data: results from counting and can only take certain values, like the number of cars in a lot or tickets sold for a concert.
Continuous data: can take any value within a range, like someone’s height, weight, or how long it takes to run a marathon.

Categorical data, on the other hand, describes qualities or characteristics.

Nominal data: consists of labels or names without order (e.g., colors, city names).
Ordinal data: has a logical order (e.g., bronze–silver–gold medals), but differences between ranks aren’t measurable.

Analogy:
Nominal data is like sorting socks by color — there’s no ranking. Ordinal data is like sorting runners by place — there’s order, but not equal distance between them.

Organizing Data for Analysis

How data is structured affects how we analyze it:

Time Series Data: tracks one variable for one subject over equal time intervals (e.g., a stock’s daily price).
Cross-Sectional Data: looks at one variable across many subjects at a single point in time (e.g., student test scores).
Panel Data: combines both — multiple variables for multiple subjects over time (e.g., household income tracked yearly across families).
Structured Data: neatly organized, repeating patterns like market data or financial statements.
Unstructured Data: lacks a fixed format — think tweets, news, or customer reviews.

Analogy:
Time series is watching one tree grow over years; cross-sectional is comparing many trees today; panel data is tracking several trees over time.

Frequency & Distribution

A frequency distribution summarizes how often data values occur.
To create one, divide your data’s range (max–min) into equal intervals (“bins”) and count how many values fall into each.

Absolute frequency: number of observations per bin.
Relative frequency: each bin’s share of the total (%).
Cumulative frequency: running total up to each bin.

Example:
If 10 students score between 70–80, and there are 100 total, the relative frequency = 10%.

Analogy:
It’s like grouping ages at a party — “how many guests are in their 20s, 30s, 40s, etc.”

Contingency & Confusion Tables

A contingency table shows how two or more variables interact.

Joint frequencies: counts inside the table for combinations of variables.
Marginal frequencies: totals for each row/column.

A confusion matrix is a special version used in machine learning. It compares predicted results to actual outcomes to measure accuracy.
Analogy: Think of it like comparing “what you guessed” vs. “what really happened.”

Visualizing Data

Choosing the right chart helps reveal insights quickly.

Common Visualization Types:

Type	Description	Best For
Histogram	Bars show frequency of numerical data	Distribution patterns
Frequency Polygon	Connects midpoints of histogram bins	Comparing shapes
Cumulative Frequency Chart	Running total line	Cumulative trends
Bar Chart	Bars show categorical data	Comparing groups
Tree Map	Rectangles sized by category value	Hierarchies
Word Cloud	Common words appear larger	Text data
Line Chart	Connects data points over time	Trends
Bubble Chart	Adds a third variable via bubble size	Multi-variable time data
Scatter Plot	Plots relationship between two variables	Correlation
Heat Map	Colors represent intensity or correlation	Multi-variable comparison

Analogy:
A histogram is like sorting candies by colour; a line chart is like watching your savings grow; a scatter plot is like seeing whether ice cream sales rise with temperature.

Choosing the Right Visualization

Purpose	Best Visualizations
Show Relationships	Scatter plot, heat map
Compare Categories	Bar chart, tree map
Compare Over Time	Line chart, bubble chart
Show Numerical Distributions	Histogram, frequency polygon
Show Categorical Distributions	Bar chart, heat map
Analyze Unstructured Text	Word cloud

Tip: Always match your visualization to your goal — don’t use a pie chart for time trends!

Measures of Central Tendency

Central tendency describes where most of your data lies — its “center.”

Arithmetic Mean: the simple average.
Median: middle value when data is ordered — best when outliers are present.
Mode: most frequent value — useful for categorical data.
Trimmed Mean: removes a set % of extreme values to reduce outlier influence.
Winsorized Mean: reassigns outliers to the nearest remaining values.
Weighted Mean: gives more importance to certain values, like portfolio returns.
Geometric Mean: used for compounded growth rates, e.g., multi-year investment returns.
Formula:

[(1+r₁)(1+r₂)…(1+rₙ)]^{(1/n)} − 1

Harmonic Mean: used when averaging rates, like average price per share.
Formula:

HM = \frac{N}{\sum \left( \frac{1}{x_i} \right)}

Relationship: Harmonic < Geometric < Arithmetic (when returns vary).

Analogy:
Arithmetic mean is a “standard average,” geometric is “growth over time,” harmonic is “averaging speeds or rates.”

Quantiles & Quartiles

Quantiles break data into equal parts:

Quartiles: 4 equal parts
Deciles: 10 equal parts
Percentiles: 100 equal parts

The interquartile range (IQR) = Q3 − Q1 (the middle 50% of data).
To find a percentile’s position: (n+1) × y/100.
Visualization: Box-and-whisker plot — shows median, quartiles, and outliers.

Analogy:
Think of slicing a pizza into equal parts — quartiles cut it into four slices, percentiles into 100 tiny pieces.

Measures of Disbursion

Dispersion shows how spread out data is — the degree of variability.

Measure	Description	Formula / Use
Range	Quick sense of spread (max minus min)	$ \text{Range} = \max(x_i) – \min(x_i) $
Mean Absolute Deviation (MAD)	Average absolute distance from the mean	$ \mathrm{MAD} = \frac{1}{n}\sum_{i=1}^n \big\| x_i – \bar{x} \big\| $
Variance	Average squared deviation	Population: $ \displaystyle \sigma^2 = \frac{1}{N}\sum_{i=1}^N (x_i – \mu)^2 $ Sample: $ \displaystyle s^2 = \frac{1}{n – 1}\sum_{i=1}^n (x_i – \bar{x})^2 $
Standard Deviation	Square root of variance — most used in finance	Population: $ \sigma = \sqrt{\sigma^2} $ Sample: $ s = \sqrt{s^2} $
Target Downside Deviation	Measures only downside volatility — focuses on losses below a target $T$	One common form: $ \displaystyle \text{TDD} = \sqrt{\frac{1}{n}\sum_{i=1}^n \big(\max(0,\,T – x_i)\big)^2} $
Coefficient of Variation (CV)	Risk per unit of return — useful for comparing dispersion across different means	$ \displaystyle CV = \frac{s}{\bar{x}} $

Analogy:
Range is like the distance between the tallest and shortest person in a room; standard deviation shows how tightly everyone’s height clusters around the average.

Shape of Distributions

The shape of a data distribution tells a story.

Skewness measures asymmetry:

Positive skew: long right tail; a few very high values. (Mode < Median < Mean)
Negative skew: long left tail; a few very low values. (Mean < Median < Mode)

Kurtosis measures “peakedness”:

Leptokurtic: tall, narrow, frequent extremes (volatile returns).
Platykurtic: flat, fewer extremes. (“Plat” = flat.)
Excess Kurtosis: value above 3 = more extreme events than normal.

Analogy:
A negatively skewed class grade distribution means most students did well, but a few failed badly. A leptokurtic distribution is like a market with rare but extreme crashes and spikes.

Covariance and Correlation

These show how two variables move together.

Covariance: shows direction (positive or negative), but not strength — it’s unbounded.
Correlation: standardizes covariance between -1 and +1.

Corr(X, Y) = \frac{Cov(X, Y)} { (σₓ × σᵧ)}

Interpretation:
+1 = move perfectly together
0 = no relationship
−1 = move perfectly opposite

Analogy:
If temperature and ice cream sales rise together, correlation is positive. If umbrellas and sunshine move opposite, it’s negative.

Summary Table

Concept	Key Idea / Formula
Data Types	Numerical (discrete, continuous), Categorical (nominal, ordinal)
Data Structures	Time series, cross-sectional, panel, structured, unstructured
Frequency	Absolute, relative, cumulative
Visualization	Histogram, bar chart, line chart, scatter, tree map, word cloud
Central Tendency	Mean, median, mode, trimmed, winsorized, weighted, geometric, harmonic
Quantiles	Quartiles, deciles, percentiles, IQR, box plot
Dispersion	Range, MAD, variance, stdev, CV, downside deviation
Shape	Skewness & kurtosis
Relationships	Covariance, correlation

Key Takeaways

Data types define what analysis you can perform.

Visualization choices depend on whether you’re comparing, tracking, or exploring relationships.

Mean, median, and mode each tell a different story about “typical” values.

Variability (standard deviation, range) matters as much as averages.

Understanding skewness, kurtosis, and correlation helps interpret risk and relationships in financial data.

Real World Analogies

Discrete data: Counting the number of tickets sold for a concert.

Continuous data: Measuring the time between two train arrivals.

Nominal data: Sorting survey responses by favorite brand.

Ordinal data: Rating hotels from 1–5 stars.

Histogram: Like sorting people by age group at a party.

Box plot: Like showing the shortest, tallest, and average heights on a team.

Skewness: A market where most returns are small but a few are extreme losses.

Correlation: Ice cream sales and temperature move together — positive correlation.

Learning Module 3: Probability Concepts

Random Variables & Events

A random variable represents uncertain numerical outcomes — it’s the number you don’t know yet, like the result of a dice roll or a coin toss. Once the outcome occurs, that value becomes the observed value.

An event is one or more outcomes that share a property. For instance, rolling an even number on a die (2, 4, or 6) is a single event.

Types of Events:

Mutually Exclusive: Two events that cannot happen together (e.g., flipping both heads and tails in one coin toss).
Exhaustive: A complete set of all possible outcomes (for a die: {1,2,3,4,5,6}).

Analogy:
Imagine dividing a pizza into slices — each slice represents a possible event. You can’t eat two slices at once (mutually exclusive), and all slices together make the full pizza (exhaustive).

Properties of Probability

All probabilities follow two basic rules:

Every event’s probability lies between 0 and 1.
The sum of probabilities across all outcomes equals 1.

Analogy:
Probability is like pie slices again — you can’t have negative slices, and all slices must fill the entire pie exactly.

Types of Probability

Empirical Probability: Based on historical data.
Example: If it rained 30 of the last 100 days, the empirical probability of rain is 0.3.

A Priori Probability: Based on logical reasoning, not observation.
Example: Rolling a 3 on a fair six-sided die = 1/6.

Subjective Probability: Based on personal judgment or intuition.
Example: An analyst believes there’s a 70% chance the market will rise.

Analogy:
Empirical = “we’ve seen it happen,”
A priori = “we know it by logic,”
Subjective = “we feel it might happen.”

Odd & Probability

Odds show likelihoods in ratio form.

Probability = \frac{a}{(a+b)}

Example: 4-to-1 odds → 4 / (4 + 1) = 0.8 = 80%.
Analogy: Odds are like betting language — “4 to 1” means four chances of winning for every one chance of losing.

Conditional vs Unconditional Probability

Unconditional Probability: The chance of an event occurring, regardless of other events.
Example: The chance of rolling a six is always 1/6, no matter previous rolls.

Conditional Probability: The probability of one event given another has occurred.
Formula: P(A | B) = Probability of A given B.

P(A \mid B) = \frac{P(AB)}{P(B)}

Example: If it’s cloudy (B), the probability of rain (A) increases.

P(A|B) = Probability of A given B

P(AB) = Probability of A & B

Analogy:
Conditional probability is like narrowing your view — once you know it’s cloudy, your forecast adjusts.

Probability Rules

It helps to picture a Venn diagram. Are you trying to find just A, just B, etc.

Rule	Meaning / Use
Multiplication Rule (Joint Probability)	to find the probability of two or more events happening together, whether the events are independent or dependent
Independent Events	When A doesn’t affect B
Addition Rule	Either event occurs
Total Probability Rule	to find the probability of one event or another event occurring, including the possibility of both happening.

Multiplication Rule (Joint Probability)

P(AB) = P(A \mid B) \times P(B)

Independent Events:

P(AB) = P(A) \times P(B)

Addition Rule:

P(A \text{ or } B) = P(A) + P(B) - P(AB)

Total Probability Rule:

P(A) = \sum_i P(A \mid B_i) \times P(B_i)

Analogy:
Multiplication is “AND,” addition is “OR.”
Tree diagrams help visualize these paths — like mapping every branch of possibilities.

Expected Value

Expected value is the probability-weighted average of all possible outcomes — your long-run average if you repeated the event infinitely.
Formula:

E(X) = \sum_i X_i P(X_i)

Example: A 50% chance of +10 and 50% chance of –10 → EV = 0.
Analogy:
If you play the same game many times, EV tells you your average win or loss per play.

Variance Using Probabilities

Variance measures how far outcomes spread from their expected value.
Formula:

Var(X) = E[(X - E(X))^2]

Compute the expected value, then average the squared deviations from it.
Analogy:
Variance is like measuring how tightly clustered or widely scattered darts are around the bullseye.

Covariance & Correlation

Covariance: Measures how two variables move together.

Var(X) = E[(X - E(X))^2]

Positive = move together, Negative = move opposite.

Correlation: Standardizes covariance between –1 and +1.

Cov(X,Y) = E[(X - E(X))(Y - E(Y))]

Analogy:
Covariance says “do they move in the same direction?”
Correlation says “how strong is that relationship?” — like grading a friendship: +1 = always agree, –1 = always disagree.

Portfolio Variance & Standard Deviations

Used to measure total risk when combining assets.

Formulas:

For 2 Assets:

\sigma_p^2 = w_A^2 \sigma_A^2 + w_B^2 \sigma_B^2 + 2w_A w_B Cov_{AB}

Using Correlation to find 2 assets:

\sigma_p^2 = w_A^2 \sigma_A^2 + w_B^2 \sigma_B^2 + 2w_A w_B \rho_{AB} \sigma_A \sigma_B

Portfolio Standard Deviation

\sigma_p = \sqrt{\sigma_p^2}

Analogy:
Like mixing two investments: if they move differently, the portfolio “smooths out” risk.

Bayes’ Theorem

Bayes’ Theorem updates probabilities when new information appears.
It combines prior probability with new evidence to calculate a revised (posterior) probability.

Conceptual Formula:

P(A \mid B) = \frac{P(B \mid A) \times P(A)}{P(B)}

Expanded form (using multiple scenarios

P(A \mid B) = \frac{P(B \mid A) P(A)}{\sum_i P(B \mid A_i) P(A_i)}

Real-World Example:
If a medical test is 95% accurate, Bayes’ Theorem helps find the probability you actually have the disease given a positive result.
Analogy:
It’s like adjusting your weather forecast when you see new clouds forming.

Counting & Probability

1. Multiplication Rule
If tasks are independent, multiply the number of ways each can occur.

N = n_1 \times n_2 \times ... \times n_k

Example: 3 shirts × 2 pants × 2 shoes = 12 outfit combinations.

2. Factorial (n!)
Total ways to arrange n items.
Formula:

n! = n \times (n - 1) \times (n - 2) \times ... \times 1

n! = n × (n – 1) × … × 1
Example: 5! = 120 ways to seat 5 people in order.

3. Labeling (Multinomial):
Arranging items into subgroups.

\frac{n!}{n_1! n_2! ... n_k!}

Example: Sorting 5 players into 2 teams of 2 and 1 substitute.

4. Combinations (nCr):
Selecting items when order doesn’t matter.

{}^nC_r = \frac{n!}{r!(n - r)!}

Example: Choosing 3 out of 10 lottery numbers.

5. Permutations (nPr):
Selecting items when order does matter.

{}^nP_r = \frac{n!}{(n - r)!}

Example: Assigning gold, silver, and bronze medals to 3 out of 10 athletes.

Analogy:
Combinations are like picking players for a team (order irrelevant).
Permutations are like ranking them on the podium (order counts).

Summary Table

Concept	Formula / Description	Real-World Analogy

Random Variable

Uncertain outcome

Rolling a die

Mutually Exclusive

Events can’t co-occur

Heads or tails

Exhaustive

All outcomes covered

All six sides of a die

Probability Rules

0 ≤ P ≤ 1; Sum = 1

Whole pie

Empirical

Based on data

Past weather

A Priori

Based on logic

Fair die odds

Subjective

Based on opinion

Investor’s hunch

Multiplication Rule

P(AB)=P(A

B)×P(B)

Addition Rule

P(A or B)=P(A)+P(B)–P(AB)

“OR” logic

Total Probability

Σ [P(A

Bᵢ) × P(Bᵢ)]

Expected Value

Σ X P(X)

Long-term average

Variance

E[(X–E(X))²]

Spread of outcomes

Covariance

E[(X–E(X))(Y–E(Y))]

Stocks moving together

Correlation

Cov(X,Y)/(σxσy)

Strength of movement

Portfolio Variance

See formula

Combined investment risk

Bayes’ Theorem

Updates P with new info

Medical test probability

Combination

nCr

Choosing team members

Permutation

nPr

Ranking winners

Key Takeaways

Probabilities quantify uncertainty and guide investment and risk analysis.

Understand the difference between unconditional and conditional events.

Use joint, addition, and total-probability rules to combine events logically.

Expected value measures average outcome; variance and correlation measure risk and relationships.

Counting rules (factorials, combinations, permutations) simplify complex probability questions.

Real-World Tip:
In finance, probability concepts underpin portfolio diversification, option pricing, and forecasting.
Always ask: Are these events independent? Does order matter? Am I updating my beliefs with new information?

Learning Module 4: Common Probability Distributions

Probability Distributions: The Foundations

A probability distribution lists all possible outcomes of a random variable and the probabilities associated with each.
There are two main types:

Discrete random variables: Countable outcomes, like the number of heads in 5 coin tosses.
Continuous random variables: Infinite outcomes within a range, like interest rates or stock returns.

Analogy:
Discrete is like counting marbles in a jar — you can list them.
Continuous is like measuring milk in a jug — it can always be divided more finely.

Discrete Uniform Distributions

A discrete uniform distribution assigns equal probabilities to all outcomes. Every event has the same chance of occurring.
Example: Rolling a fair six-sided die (each side = 1/6).
Analogy:
Like a perfectly balanced spinner — each color slice is the same size, so no outcome is more likely than another.

Cumulative Distribution Function (CDF)

The CDF gives the probability that a random variable is less than or equal to a certain value.
It builds from 0 to 1 as you move through the distribution.
Example: If 70% of students scored 80 or less, then F(80) = 0.7.
Analogy:
Think of it as a “running total” of probability — like filling a glass of water until it’s full (100%).

Continuous Uniform Distribution

This distribution covers all values within a continuous range between A (minimum) and B (maximum).
The probability of an exact single value is 0%, but ranges have measurable probability.
Formula:

P(x_1 \leq X \leq x_2) = \frac{x_2 - x_1}{b - a}

Analogy:
Imagine a dartboard where any hit within the circle counts — no single exact point has weight, only areas matter.

Bernoulli Trials

Examples include flipping a coin, default/no default, or pass/fail on an exam.
Each trial is independent, and the probability of success is constant.
Analogy:
Like a light switch — it’s either on or off, with no in-between.

Binomial Distribution

The binomial distribution gives the probability of getting a specific number of successes (x) in a fixed number of independent Bernoulli trials (n).
Formula:

P(X = x) = \binom{n}{x} p^x (1 - p)^{n - x}

Expected Value:

E(X) = np

Variance:

Var(X) = np(1 - p)

Example:
What’s the probability of flipping 3 heads in 5 fair coin tosses?

P(3)=(53)(0.5)3(0.5)2=10(0.125)(0.25)=0.3125P(3)=(35)(0.5)3(0.5)2=10(0.125)(0.25)=0.3125

Analogy:
Like counting how many baskets a player makes in 10 free throws — each shot is a Bernoulli trial.

Normal Distribution

The normal distribution is the bell-shaped curve that describes many real-world data sets — symmetric around the mean.
Key Properties:

Mean = Median = Mode
Skewness = 0 (perfect symmetry)
Kurtosis = 3
68% of values fall within 1σ
95% within 2σ
99% within 3σ

Analogy:
Imagine test scores in a large class — most students cluster around the average, with fewer outliers at both ends.

Univariate vs Multivariate Distributions

Univariate: Describes one variable (e.g., one stock’s returns).

Multivariate: Describes multiple correlated variables (e.g., returns of two stocks).

The number of pairwise correlations in n variables = n(n−1)22n(n−1).
Analogy:
One variable = a single track on a chart.
Multivariate = multiple tracks moving together or apart — like dancing partners.

Confidence Intervals (Normal Distribution)

A confidence interval shows the range where a population parameter is likely to lie, given a certain probability.

Confidence Level	Z-Value	Coverage
90%	1.65σ	±1.65 standard deviations
95%	1.96σ	±1.96 standard deviations
99%	2.58σ	±2.58 standard deviations

Analogy:
It’s like saying, “I’m 95% confident the dart will land within this ring around the bullseye.”

Z-Tables

The Z-table provides the probability (area) to the left of a given Z-value under the standard normal curve.
If Z=1.65Z=1.65, then 95% of the area lies to its left.
Tip: The Z-table only applies to normal distributions.

Analogy:
It’s a probability map — Z tells you how far from the mean you are, and the table tells you how much of the population lies below that point.

Roy’s Safety First Criterion

This helps investors choose portfolios that minimize the chance of returns falling below a minimum acceptable threshold (Rᴸ).
Formula:

SFR = \frac{E(R_p) - R_L}{\sigma_p}

The higher the ratio, the safer the portfolio.
Analogy:
Like choosing the parachute that gives you the highest chance of landing safely above the danger zone.

Lognormal Distribution

A lognormal distribution arises when the natural logarithm of a variable is normally distributed.
It’s positively skewed — meaning it has a long right tail.
Used to model stock prices, since they can’t go below zero but can rise infinitely.
Formula:
If ln⁡(X)∼N(μ,σ2), then X follows a lognormal distribution.

\ln(X) \sim N(\mu, \sigma^2)

Analogy:
Stock prices act like trees — they can grow tall (positive skew) but can’t sink below the ground (zero).

Continuously Compounded Returns

Used in finance for modeling constant growth.
Formula:

R_{cc} = \ln\left(\frac{P_t}{P_0}\right)

P_t = P_0 e^{R_{cc}}

Analogy:
Think of your return compounding smoothly every second instead of once per year — like continuous water flow versus monthly drips.

T-Distribution

The t-distribution is similar to the normal curve but has thicker tails and a lower peak — accounting for small sample uncertainty.
As degrees of freedom (df = n – 1) increase, it approaches the normal distribution.

Confidence Level	t (df=29)
90%	1.699
95%	2.045
99%	2.756

Analogy:
When you have fewer data points, you give more room for error — the t-curve spreads out wider, like estimating an average from only a few test scores.

Chi-Square (χ²) Distribution

The chi-square distribution represents the sum of squared standard normal variables.
Used to test variance and goodness-of-fit.
It only takes positive values, since squared numbers can’t be negative.
As degrees of freedom rise, it becomes more symmetric and bell-shaped.

Analogy:
Like squaring every person’s deviation from average height — the negatives disappear, leaving only total variation.

F-Distribution

Used to compare two variances (e.g., volatility of two portfolios).
Defined by two degrees of freedom:

Denominator (df₂)
As both increase, the F-curve looks more like a normal distribution.
Analogy:
Like comparing two pitchers’ accuracy — you’re testing whether their throws vary equally.

Numerator (df₁)

Monte Carlo Simulation

A Monte Carlo simulation uses random sampling to model complex systems and estimate probabilities.
Common applications include:

Estimating risk and return of portfolios
Valuing complex securities (options, derivatives)
Running sensitivity and scenario analyses

Limitations:

Provides statistical estimates, not exact results
Doesn’t explain why outcomes occur

Analogy:
Like running a video game thousands of times to see how often you win — you can estimate your odds but not predict every move exactly.

Summary Table

Concept	Formula / Property	Real-World Analogy
Discrete Variable	Countable outcomes	Rolling dice
Continuous Variable	Infinite outcomes	Measuring time
Uniform (Discrete)	Equal probabilities	Fair spinner
Bernoulli	2 outcomes	On/off switch
Binomial	(nx)px(1−p)n−x(xn)px(1−p)n−x	Shots made in basketball
Normal	Symmetrical, bell curve	Test scores
Lognormal	Positively skewed	Stock prices
Z-Score	Z=x−μσZ=σx−μ	Distance from mean
Roy’s Criterion	SFR=E(Rp)−RLσpSFR=σpE(Rp)−RL	Choosing safest parachute
T-Distribution	Small sample version of normal	Fewer data = wider curve
Chi-Square	Sum of squares	Measuring variance
F-Distribution	Ratio of variances	Comparing volatility
Monte Carlo	Repeated simulation	Playing out scenarios

Key Takeaways

Probability distributions model uncertainty — discrete for countable, continuous for measurable.

Normal and lognormal distributions form the backbone of return modeling.

Confidence intervals show precision around estimates.

T, Chi-square, and F distributions handle smaller samples or variance testing.

Monte Carlo simulation helps test “what-if” scenarios without analytical formulas.

In finance: these distributions help model returns, risk, and statistical inference — the bridge from probability theory to investment decision-making.

Learning Module 5: Sampling and Estimation

Population Parameters vs. Sample Statistics

A parameter describes an entire population (like the true mean μ or variance σ²), while a sample statistic summarizes a smaller sample (like sample mean x̄ or standard deviation s).
Since we rarely observe entire populations, we use statistics to estimate parameters.

Real Example:
The true average daily spending across all CIBC clients is a population mean (μ). The average from 1,000 sampled clients is a sample mean (x̄).

Analogy:
The population is the entire ocean; your sample is a bucket of seawater. Measuring salt in your bucket helps you estimate the ocean’s salinity.

Probability Sampling Methods

In probability sampling, every member has a known, nonzero chance of selection. This reduces bias and supports valid statistical inference.

Types:

Cluster Sample: Divide into mini-populations (clusters) and sample clusters instead of individuals.
Example: Choose several bank branches (clusters) and collect all or some customer responses.
Analogy: Testing a few classrooms to infer school-wide performance.

Simple Random Sample (SRS): Every element has an equal chance.
Example: Randomly choosing 1,000 clients using a random number generator.
Analogy: Drawing names from a hat.

Systematic Sample: Select every nᵗʰ element after a random start.
Example: Selecting every 50th transaction after a random start.
Pitfall: Hidden patterns (e.g., daily cycles) can bias results.
Analogy: Checking every 10th product on a conveyor belt.

Stratified Sample: Divide the population into subgroups (strata) and sample proportionally within each.
Example: Segment clients by region or wealth tier and sample from each.
Analogy: Picking fruit from all parts of a tree to represent different sunlight exposures.

Non-Probability Sampling

Selection depends on convenience or judgment, not randomization. It’s faster but more prone to bias.

Judgment Sample: Expert-selected sample believed to be representative.
Example: A risk analyst handpicking key corporate clients.
Analogy: A chef tasting a “typical” spoonful of soup — may not capture full variability.

Convenience Sample: Select what’s easiest to access.
Example: Using only clients who respond to a mobile survey.
Analogy: Asking whoever is nearby for directions.

Sampling Error

Definition: The difference between a sample statistic and the population parameter it estimates.
Larger, well-designed samples reduce error but never eliminate it.

Example: Sample mean ATM withdrawal = $120; true population mean = $125 → Sampling error = $5.
Analogy: Measuring the room’s temperature to estimate the whole building’s — close, but not exact.

Central Limit Theorem (CLT)

The CLT is the cornerstone of inference: regardless of population shape, the sampling distribution of the sample mean (x̄) becomes approximately normal when sample size (n) is large.

Formulas:Var(Xˉ)=σ2n,SEXˉ=σn

\mathrm{Var}(\bar{X}) = \frac{\sigma^2}{n}, \quad SE_{\bar{X}} = \frac{\sigma}{\sqrt{n}} \

Example: Averaging 30 days of returns gives a nearly normal distribution of average returns even if daily data is skewed.
Analogy: Individual raindrops fall randomly, but the average rainfall per week forms a smooth pattern.

Properties of a Good Estimator

A good estimator has three key traits:

Property	Description	Analogy
Unbiased	Its expected value equals the true parameter.	The arrows, on average, hit the bullseye.
Efficient	Has the smallest variance among unbiased estimators.	The arrows cluster tightly around the center.
Consistent	Improves as n increases (standard error ↓).	More practice = tighter grouping around the bullseye.

Example: The sample mean x̄ is an unbiased, consistent, and efficient estimator of μ for large n.

Estimator vs. Point Estimate

An estimator is the formula used to calculate a sample statistic; a point estimate is the resulting single value.
Analogy: The estimator is the recipe; the point estimate is the cookie you bak

Confidence Intervals

A confidence interval (CI) gives a range where the population mean likely lies, based on a sample.
Wider intervals → more confidence, less precision.

Formula (σ known):Xˉ±Zα/2(σn)Xˉ±Zα/2(nσ)

\bar{X} \pm Z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}} \right)

Example: Average mortgage processing time = 5 days, σ = 1.2, n = 64.
At 95% confidence

(Z = 1.96):5± 1.96×1.28=5±0.294⇒[4.706,5.294]

Analogy: Guardrails around your best guess — wider rails mean more safety but less precision.

Resampling Methods

Resampling generates new samples from existing data to estimate variability when formulas aren’t practical.

Jackknife Method:

Remove one observation at a time (n repetitions).
Reduces bias and estimates standard error.
Produces stable results.
Analogy: Removing one plank at a time from a bridge to test its sturdiness.

Bootstrap Method:

Different each run; used to estimate standard errors and CIs.
Analogy: Making many mini-batches of cookies by reusing ingredients — some batches repeat the same scoops.

Randomly resample with replacement from the original data.

Common Biases in Empirical Analysis

Bias	Description	Example	Mitigation
Sample Selection	Certain groups excluded	Only branch clients surveyed	Use probability sampling
Data Snooping	Searching until “something” fits	Testing hundreds of factors until one works	Predefine hypotheses, use out-of-sample tests
Survivor Bias	Excluding failed entities	Studying only surviving funds	Include inactive entities
Look-Ahead Bias	Using future data in past tests	Backtests using revised data	Restrict to information available at the time
Time-Period Bias	Analysis window not representative	Strategy tested only in bull markets	Use multiple timeframes

Analogy: Building a map from future roads or skipping closed streets — your navigation looks accurate but misleads.

Choosing Between z and t Tests

Distribution of Data	σ Known?	n < 30	n ≥ 30	Use
Normal	Yes	z	z	z
Normal	No	t	t*	t (≈ z for large n)
Non-Normal	Yes	n/a	z	z (via CLT if large n)
Non-Normal	No	n/a	t*	t (via CLT if large n)

Analogy:
Choosing z or t is like picking the right wrench:

Larger projects (large n) give you flexibility.

z for when you know the bolt size (σ known),

t for when you estimate it.

Summary Table

Concept	Formula / Description	Analogy
Sampling Error	( \bar{X} – \mu )	Measuring one room’s temperature to estimate the whole building
CLT	( Var(\bar{X}) = \frac{\sigma^2}{n} )	Averaging chaos into order
Confidence Interval	( \bar{X} \pm Z_{\alpha/2}\frac{\sigma}{\sqrt{n}} )	Guardrails around a guess
Jackknife	Leave-one-out resampling method	Remove one plank to test bridge strength
Bootstrap	Resample with replacement to estimate sampling distribution	Reusing ingredient scoops
Good Estimator	Unbiased, Efficient, Consistent	Tight arrow grouping on target
Bias Types	Sampling, Snooping, Survivor, Look-ahead, Time	Wrong map, wrong road

Key Takeaways

Choose z vs. t based on normality, sample size, and whether σ is known.

Probability samples yield more representative results; non-probability samples are quicker but risk bias.

The Central Limit Theorem ensures that sample means approximate normality for large n.

Good estimators are unbiased, efficient, and consistent.

Confidence intervals quantify uncertainty.

Jackknife and bootstrap methods are robust alternatives to analytical formulas.

Bias awareness is critical for valid empirical work.

Learning Module 6: Hypothesis Testing

What is Hypothesis Testing

Hypothesis testing is a structured way to make decisions about a population parameter using sample data.
You start with two competing claims:

Null hypothesis (H₀): The “status quo” — what we assume true until proven otherwise.
Alternative hypothesis (Hₐ): What the analyst believes or wants to prove.

Example:
You believe a portfolio’s mean return is greater than 8%.

H0:μ=8% vs. Ha:μ>8%

Analogy:
Like a courtroom: H₀ is “innocent until proven guilty.” You need enough evidence (data) to reject it.

Steps in Hypothesis Testing

Step	Description	Example / Analogy
1. State Hypotheses	Define H₀ and Hₐ. H₀ always includes equality (=, ≤, or ≥).	“The coin is fair” (H₀) vs. “The coin is biased” (Hₐ).
2. Choose Test Type	Decide 1-tailed or 2-tailed.	Testing for equality → 2-tailed; Testing for > or < → 1-tailed.
3. Determine Significance (α)	α = probability of Type I error. Common levels: 0.01, 0.05, 0.10.	“How strict is the judge?”
4. Calculate Test Statistic	Compare sample data to hypothesized parameter.	Compute z, t, χ², or F.
5. Determine Critical Value or p-value	Defines the rejection region.	If p < α, reject H₀.
6. Make Decision	Reject or fail to reject H₀.	Convict or acquit the defendant.

Errors in Hypothesis Testing

Error	Definition	Analogy

Type I Error (α)

Rejecting a true H₀

Throwing away a good apple

Type II Error (β)

Failing to reject a false H₀

Eating a bad apple

Power of a Test

1−β:
Probability of correctly rejecting a false H₀

Detecting the bad apple correctly

Test Statistic Formula (for mean)

Z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}}

If σ unknown → use t-statistic:

t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}}

Interpretation:
If your test statistic lands in the rejection zone, you have enough evidence to reject H₀.

P-Value Concept

The p-value is the smallest significance level (α) at which you can reject H₀.

If p < α, reject H₀.

If p > α, fail to reject H₀.

Analogy:
Think of α as the “bar for evidence.”
If your p-value is lower than that bar, your evidence is strong enough to convict (reject H₀).

One tailed vs Two Tailed

Type	Hypotheses	Rejection Region	Example
Two-Tailed	H0:μ=10 Ha:μ≠10	Both ends of distribution	Testing if returns differ from 10%
Right-Tailed	H0:μ≤10 Ha:μ>10	Upper tail	Testing if returns are greater than 10%
Left-Tailed	H0:μ≥10 Ha:μ<10	Lower tail	Testing if returns are below 10%

Statistical vs Economical Significance

Even if a result is statistically significant, it may not be economically meaningful.
A small difference might be real but too tiny to matter financially.
Example:
A 0.01% increase in portfolio return may be “significant,” but transaction costs may erase the gain.

Multiple Testing Problem

When testing many hypotheses, the probability of making at least one Type I error increases.
Procedure (Bonferroni correction):

Rank p-values from smallest to largest.
Compute adjusted α = α(rank / total tests).
Compare p-values ≤ adjusted α → significant.

Analogy:
The more dart throws you take, the more likely you’ll hit the bullseye by accident.

Key Tests Summary

Test	What It Tests	Requirements	Statistic
Z-Test	Population mean (σ known)	Normal population	$ Z = \frac{\bar{X} – \mu_0}{\sigma / \sqrt{n}} $
T-Test	Population mean (σ unknown)	Normal or large n	$ t = \frac{\bar{X} – \mu_0}{s / \sqrt{n}} $
Chi-Square Test	Population variance	Normal data	$ \chi^2 = \frac{(n-1)s^2}{\sigma_0^2} $
F-Test	Equality of two variances	Normal, independent	$ F = \frac{s_1^2}{s_2^2} $

Difference in Means Tests

Independent Samples (Equal Variances):

t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}

where sp=:

s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}

Independent Samples (Unequal Variances):

t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

Paired Comparisons (Dependent Samples):

t = \frac{\bar{d} - 0}{s_d / \sqrt{n}}

dˉ=mean of differences,

sd=standard deviation of differences.

Analogy:
Independent = comparing two teams’ averages.
Paired = comparing the same team’s “before and after” performance.

Tests for Correlation

Pearson Correlation Test (parametric):
Tests if the population correlation coefficient (ρ) differs from 0.

t = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^2}}

Spearman Rank Correlation (non-parametric):
Used when data aren’t normally distributed or contain outliers.
It tests whether rankings between two variables are correlated.

Analogy:
Pearson: comparing precise scores.
Spearman: comparing the order of finish (ranks).

Chi Square Tests for Independence

Used in contingency tables to test if two categorical variables are related.

\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

where:

E_{ij} = \frac{(\text{row total})(\text{column total})}{\text{grand total}}

Example:
Is fund style (Value, Growth, Blend) independent of fund size (Small, Medium, Large)?

Analogy:
Checking if ice cream flavor preference depends on age group — or if they’re independent.

Power of a Test

\text{Power of a Test} = 1 - \beta

Higher power means better detection of real effects — usually achieved with larger sample size or higher α.

Non-Parametric Test

Used when data don’t meet assumptions of normality or when variables are in ranks rather than raw values.
Examples:

Spearman Rank Correlation (relationship in ranks)
Runs Test (randomness)
Wilcoxon Signed-Rank Test (median comparisons)
Chi-Square Independence Test (categorical relationships)

Analogy:
When numbers are messy or skewed, switch to rank-based tests — less precise but more robust.

Summary Table

Concept	Formula
Z-test (mean)	$ Z = \frac{\bar{X} – \mu_0}{\sigma / \sqrt{n}} $
t-test (mean)	$ t = \frac{\bar{X} – \mu_0}{s / \sqrt{n}} $
Chi-square	$ \chi^2 = \frac{(n – 1)s^2}{\sigma_0^2} $
F-test	$ F = \frac{s_1^2}{s_2^2} $
Correlation test	$ t = \frac{r\sqrt{n – 2}}{\sqrt{1 – r^2}} $
Power	$ 1 – \beta $
Confidence interval	$ \bar{X} \pm Z_{\alpha/2}\frac{\sigma}{\sqrt{n}} $
Spearman	$ r_s = 1 – \frac{6\sum d_i^2}{n(n^2 – 1)} $

Key Takeaways

Hypothesis testing = data-based decision-making framework.

Always define H₀ and Hₐ clearly — H₀ includes equality.

Errors are inevitable: minimize α and β trade-offs.

p-value tells you if results are significant; smaller = stronger evidence.

Choose tests based on data type, distribution, and sample size.

Statistical significance ≠ practical relevance.

Non-parametric tests are your backup when assumptions break.

Learning Module 7: Into to Linear Regression

What is Linear Regression

Linear regression explains how one variable (dependent, Y) changes in response to another (independent, X).
It draws a “best fit” line through data points to predict Y based on X.

Simple Linear Regression: one X variable.

Y_i = b_0 + b_1 X_i + \varepsilon_i

Yi: actual observed value

\varepsilon_i = Y_i - \hat{Y_i} = residual (error)

Predicted Value on Regression Line:

\hat{Y_i} = b_0 + b_1 X_i

Goal: minimize squared residuals (Least Squares Method).

Analogy:
Think of trying to balance a tightrope perfectly through scattered data points — the line should stay as close as possible to every point without tilting too much.

Least Squares Criterion

Regression chooses b0 and b1 to minimize the sum of squared errors:

SSE = \sum_{i=1}^{n} (Y_i - \hat{Y_i})^2

This ensures the line captures the general trend while limiting large deviations.

Regression Assumptions

Assumption	Description	Analogy
1. Linearity	Relationship between X and Y must be linear. Residuals should appear random.	Drawing a straight path through scattered dots.
2. Homoscedasticity	Variance of residuals is constant across X values.	Raindrops evenly spread on a windshield.
3. Independence	Observations and residuals are independent (no autocorrelation).	Each data point acts alone — no “copying neighbors.”
4. Normality	Residuals are normally distributed.	The errors form a nice bell curve centered around zero.

Violating these leads to biased or inefficient results.

Decomposing Variance

Regression breaks total variation in Y into explained and unexplained parts:S

SST=SSR+SSE

Component	Meaning	Formula
SST	Total Sum of Squares	$ SST = \sum (Y_i – \bar{Y})^2 $
SSR	Regression Sum of Squares (explained)	$ SSR = \sum (\hat{Y_i} – \bar{Y})^2 $
SSE	Error Sum of Squares (unexplained)	$ SSE = \sum (Y_i – \hat{Y_i})^2 $

Analogy:
Think of SST as total “scatter” of your darts around the bullseye. Regression (SSR) explains part of it; SSE is your leftover miss.

Goodness of Fit – R² and F-Statistic

Coefficient of Determination (R²):

R2=SSRSSTR2=SSTSSR

R^2 = \frac{SSR}{SST}

Measures how much variation in Y is explained by X.
Ranges from 0 → 1 (higher = better).

Example: R²=0.90 → 90% of variation in Y explained by the model.

Analogy:
R² is like how much of a movie plot you can explain with a single character’s actions — higher R² means that character (X) drives more of the story (Y).

F-Statistic (Model Significance):

Tests if the model explains a significant portion of Y’s variation.

F=MSRMSE=(SSR/k)(SSE/(n−(k+1)))F=MSEMSR=(SSE/(n−(k+1)))(SSR/k)

F = \frac{MSR}{MSE} = \frac{(SSR / k)}{(SSE / (n - (k + 1)))}

k: number of independent variables
n: number of observations

A high F means the model is statistically useful.

Analogy:
Like testing if your overall recipe actually tastes better than random mixing.

ANOVA Table – Regression Breakdown

Source	SS	df	MS	F
Regression	SSR	k	MSR = SSR/k	MSR/MSE
Error	SSE	n-(k+1)	MSE = SSE/(n-(k+1))	—
Total	SST	n-1	—	—

Interpretation:
The F-statistic in ANOVA tells you whether the regression model significantly improves prediction compared to using just the mean.

Standard Error of Estimate

Measures average distance between actual and predicted Y values:

SEE = \sqrt{\frac{SSE}{n - 2}}

Lower SEE = better model fit (closer predictions).

Analogy:
Like measuring how much your aim misses the bullseye — smaller SEE = more accurate throws.

Testing the Slope

To test if the relationship between X and Y is statistically significant:

t = \frac{b_1 - \beta_1}{S_{b_1}}

b1: estimated slope
β1: hypothesized slope (usually 0)
Sb1: standard error of slope
df = n − 2

where:

S_{b_1} = \frac{SEE}{\sqrt{\sum (X_i - \bar{X})^2}}

If |t| > t₍critical₎ → Reject H₀ (slope ≠ 0).

Analogy:
If the line’s slope is meaningfully tilted (not flat), X actually predicts Y.

Testing the Intercept (b0)

t=Sb0b0−β0

t = \frac{b_0 - \beta_0}{S_{b_0}}

with:

S_{b_0} = SEE \sqrt{\frac{1}{n} + \frac{\bar{X}^2}{\sum (X_i - \bar{X})^2}}

Usually less important unless the intercept itself has theoretical meaning (e.g., expected Y when X = 0).

Pearson Correlation Coefficient

Measures the strength and direction of linear relationship between X and Y:

r = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}

Can also be tested with:

t = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^2}}

df = n − 2

Interpretation:

r = +1 → perfect positive linear relationship
r = 0 → no linear relationship
r = −1 → perfect negative linear relationship

Analogy:
If X and Y dance in sync, r ≈ 1; if one moves left while the other moves right, r ≈ −1.

Predicting Values of Y

Predicted value:

\hat{Y_i} = b_0 + b_1 X_i

Confidence Interval for Prediction:

\hat{Y} \pm t_{critical} \times S_f

Where:

S_f = SEE \sqrt{1 + \frac{1}{n} + \frac{(X_0 - \bar{X})^2}{\sum (X_i - \bar{X})^2}}

Sf is the standard error of forecast.
Actual Y vales will likely fall within this interval.

Analogy:
The regression line gives the “best guess,” but the confidence band is your safety net around it.

Functional (Non-Linear) Forms

If the X–Y relationship is not linear, transform variables into a linearizable form.

Model	Formula	Interpretation
Log-Lin	$ \ln(Y) = b_0 + b_1 X $	1-unit change in X → % change in Y
Lin-Log	$ Y = b_0 + b_1 \ln(X) $	1% change in X → $ \frac{b_1}{100} $ change in Y
Log-Log	$ \ln(Y) = b_0 + b_1 \ln(X) $	% change in X → % change in Y (elasticity = $ b_1 $)

Analogy:
You’re reshaping the data until it fits a straight line — like adjusting camera angles to see a straight horizon.

Evaluating the Model

A good regression model has:
✅ High R²
✅ High F-statistic
✅ Low SEE
✅ Uncorrelated residuals
✅ Normally distributed residuals

Analogy:
A “good fit” is like a well-tailored suit — it follows the shape closely, with minimal wrinkles (errors).

Summary Table

Concept	Formula	Interpretation
Regression Equation	$ Y_i = b_0 + b_1 X_i + \varepsilon_i $	Line of best fit
Total Variation	$ SST = SSR + SSE $	Total = explained + unexplained
R²	$ R^2 = \frac{SSR}{SST} $	% of variation explained
F-statistic	$ F = \frac{MSR}{MSE} $	Overall model significance
SEE	$ SEE = \sqrt{MSE} $	Prediction accuracy
t-stat (slope)	$ t = \frac{b_1 – \beta_1}{S_{b_1}} $	Tests slope significance
Pearson r	$ r = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y} $	Linear relationship strength
Prediction Interval	$ \hat{Y} \pm t_{critical} S_f $	Range of likely Y values

Key Takeaways

Regression quantifies and predicts relationships between variables.

Assumptions (linearity, independence, constant variance, normality) must hold for valid inference.

R² shows explanatory power; F-tests show model relevance.

t-tests validate whether slope(s) matter.

Always check residuals — they tell you if the model is honest.

Nonlinear relationships can often be made linear through log transformations.

Measure	Description	Formula / Use
Range	Quick sense of spread (max minus min)	\( \text{Range} = \max(x_i) – \min(x_i) \)
Mean Absolute Deviation (MAD)	Average absolute distance from the mean	\( \mathrm{MAD} = \frac{1}{n}\sum_{i=1}^n \big\| x_i – \bar{x} \big\| \)
Variance	Average squared deviation	Population: \( \displaystyle \sigma^2 = \frac{1}{N}\sum_{i=1}^N (x_i – \mu)^2 \) Sample: \( \displaystyle s^2 = \frac{1}{n – 1}\sum_{i=1}^n (x_i – \bar{x})^2 \)
Standard Deviation	Square root of variance — most used in finance	Population: \( \sigma = \sqrt{\sigma^2} \) Sample: \( s = \sqrt{s^2} \)
Target Downside Deviation	Measures only downside volatility — focuses on losses below a target \(T\)	One common form: \( \displaystyle \text{TDD} = \sqrt{\frac{1}{n}\sum_{i=1}^n \big(\max(0,\,T – x_i)\big)^2} \)
Coefficient of Variation (CV)	Risk per unit of return — useful for comparing dispersion across different means	\( \displaystyle CV = \frac{s}{\bar{x}} \)

Concept	Formula
Z-test (mean)	\( Z = \frac{\bar{X} – \mu_0}{\sigma / \sqrt{n}} \)
t-test (mean)	\( t = \frac{\bar{X} – \mu_0}{s / \sqrt{n}} \)
Chi-square	\( \chi^2 = \frac{(n – 1)s^2}{\sigma_0^2} \)
F-test	\( F = \frac{s_1^2}{s_2^2} \)
Correlation test	\( t = \frac{r\sqrt{n – 2}}{\sqrt{1 – r^2}} \)
Power	\( 1 – \beta \)
Confidence interval	\( \bar{X} \pm Z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \)
Spearman	\( r_s = 1 – \frac{6\sum d_i^2}{n(n^2 – 1)} \)

Component	Meaning	Formula
SST	Total Sum of Squares	\( SST = \sum (Y_i – \bar{Y})^2 \)
SSR	Regression Sum of Squares (explained)	\( SSR = \sum (\hat{Y_i} – \bar{Y})^2 \)
SSE	Error Sum of Squares (unexplained)	\( SSE = \sum (Y_i – \hat{Y_i})^2 \)

Model	Formula	Interpretation
Log-Lin	\( \ln(Y) = b_0 + b_1 X \)	1-unit change in X → % change in Y
Lin-Log	\( Y = b_0 + b_1 \ln(X) \)	1% change in X → \( \frac{b_1}{100} \) change in Y
Log-Log	\( \ln(Y) = b_0 + b_1 \ln(X) \)	% change in X → % change in Y (elasticity = \( b_1 \))

Concept	Formula	Interpretation
Regression Equation	\( Y_i = b_0 + b_1 X_i + \varepsilon_i \)	Line of best fit
Total Variation	\( SST = SSR + SSE \)	Total = explained + unexplained
R²	\( R^2 = \frac{SSR}{SST} \)	% of variation explained
F-statistic	\( F = \frac{MSR}{MSE} \)	Overall model significance
SEE	\( SEE = \sqrt{MSE} \)	Prediction accuracy
t-stat (slope)	\( t = \frac{b_1 – \beta_1}{S_{b_1}} \)	Tests slope significance
Pearson r	\( r = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y} \)	Linear relationship strength
Prediction Interval	\( \hat{Y} \pm t_{critical} S_f \)	Range of likely Y values