Linear Regression from First Principles
This lesson explains how linear regression works without hiding the core logic inside a library. It also explains what prediction means, how prediction is used in regression, how coefficient of correlation connects to linear relationships, and how the same idea appears in classification.
Lesson Roadmap
- Prediction in machine learning
- Linear regression from first principles
- Error and loss
- Closed-form slope and intercept
- Coefficient of correlation
- Manual Python implementation
- Prediction in classification
- How classification uses prediction differently
1. What is Prediction?
In machine learning, prediction means using a learned pattern to estimate an output for a new input.
Example: if a model has learned the relationship between study hours and marks, then for a new student who studies 6 hours, it can predict the marks that student may score.
A prediction is the model's best guess based on the pattern it learned from past data.
Regression prediction
The output is a number. Example: marks, price, temperature, salary.
Classification prediction
The output is a category or class. Example: spam or not spam, pass or fail.
Common idea
Both use input data and learned patterns. The difference is the type of output.
2. Linear Regression from First Principles
Linear regression assumes that the relationship between input and output can be represented by a straight line.
Here, x is the input, y is the predicted output, m is the slope, and c is the intercept.
The central question is: how do we find the best values of m and c from data?
3. Meaning of Slope
The slope tells us how much the output changes when the input increases by 1 unit.
If slope is positive, the output increases as input increases. If slope is negative, the output decreases as input increases.
4. Meaning of Intercept
The intercept tells us the predicted output when the input is 0.
It is the point where the line cuts the vertical axis.
5. Training Data Example
| Study Hours (x) | Marks (y) |
|---|---|
| 1 | 20 |
| 2 | 30 |
| 3 | 40 |
| 4 | 50 |
| 5 | 60 |
The goal is to find a line that comes as close as possible to all these points.
6. What Does Best Line Mean?
The best line is the line whose predictions are closest to the actual values.
For each point, the model predicts a value. The difference between actual and predicted values is called the error.
We want these errors to be as small as possible for all training points.
7. Why Square the Error?
Positive and negative errors can cancel each other. Squaring avoids that.
Squaring also gives more penalty to large mistakes.
8. Loss Function
In linear regression, we often minimize the sum of squared errors or the mean squared error.
The lower the loss, the better the line fits the data.
9. First-Principles Formula for Slope and Intercept
For simple linear regression with one input variable, the slope and intercept can be calculated directly.
These formulas give the best-fit line for one-variable linear regression.
10. Coefficient of Correlation
The coefficient of correlation measures the strength and direction of the linear relationship between two variables.
The value of r lies between minus 1 and plus 1.
- r = 1 means perfect positive linear relationship
- r = -1 means perfect negative linear relationship
- r = 0 means no linear relationship
Correlation does not directly give the prediction line, but it tells us how strongly the variables are linearly related.
11. Interpreting Correlation
| Value of r | Interpretation |
|---|---|
| Near +1 | Strong positive relationship |
| Near -1 | Strong negative relationship |
| Near 0 | Weak or no linear relationship |
12. Correlation and Regression
Regression gives us a line for prediction.
Correlation tells us how strong the linear relationship is.
If correlation is strong and positive, the regression line often has a positive slope. If correlation is strong and negative, the regression line often has a negative slope.
13. Manual Correlation Calculation in Python
This computes correlation directly from first principles.
14. Manual Python Implementation from First Principles
This code calculates the line without using scikit-learn.
15. Manual Prediction Using the Learned Line
Once we know the slope and intercept, prediction is easy. We simply put the new input value into the line equation.
16. Complete First-Principles Program
17. What This Program Teaches
- How averages are used
- How slope is computed from data
- How intercept is calculated
- How prediction is produced
- How coefficient of correlation is computed
- How a line summarizes the trend in the dataset
18. Same Idea with NumPy
NumPy makes the implementation shorter and cleaner.
19. Comparison with scikit-learn
The first-principles version teaches the logic. The scikit-learn version is practical for real projects.
20. What Prediction Means in Classification
In classification, prediction still means the model gives an output for a new input, but the output is not usually a continuous number like marks or price.
Instead, the output is a class, category, or label such as:
- Spam or not spam
- Pass or fail
- Cat or dog
- Positive or negative review
21. Prediction in Regression
Regression predicts a real number.
22. Prediction in Classification
Classification predicts a label or class.
23. How Classification Uses Prediction
Classification models often first produce a score or probability and then convert it into a class label.
Example:
So classification also uses prediction, but the predicted quantity is often interpreted differently.
Regression
Predicts a continuous value like 72.3 or 145000.
Classification
Predicts a class label like pass, fail, spam, or not spam.
Shared idea
Both take input features and generate a prediction from learned patterns.
24. Simple Classification Example
Suppose we want to classify whether a student will pass or fail based on study hours.
| Study Hours | Class |
|---|---|
| 1 | Fail |
| 2 | Fail |
| 4 | Pass |
| 5 | Pass |
A classification model would learn from this data and then predict a class for a new student.
25. Important Difference
What number should I predict?
Classification asks:Which category should I assign?
26. Using Our Python Editor
Use the embedded Programmer’s Picnic Python editor below to run the first-principles regression code, modify datasets, and experiment with predictions.
Practice Task 1
Change the dataset and compute slope and intercept manually.
Practice Task 2
Calculate coefficient of correlation manually and explain the result.
Practice Task 3
Write a paragraph explaining the difference between regression prediction and classification prediction.
27. Practice Questions
- What does a linear regression model try to learn?
- What is the meaning of slope?
- What is the meaning of intercept?
- Why do we square errors?
- How do we calculate prediction after learning slope and intercept?
- What is the coefficient of correlation?
- What does a correlation near plus one mean?
- What is the difference between regression prediction and classification prediction?
- Why is first-principles implementation useful for learning?
28. Speak Paragraphs
Hidden narration paragraphs with IDs are included below for speech and guided reading systems.
29. Conclusion
Linear regression from first principles teaches the heart of machine learning: learn a pattern from data, measure error, and use the learned pattern to make predictions.
Coefficient of correlation helps us understand how strongly two variables are linearly related.
Classification also uses prediction, but instead of returning a continuous number, it returns a class or label, often after computing a score or probability.