📈 Linear Regression Deep Dive

Linear Regression from First Principles

This lesson explains how linear regression works without hiding the core logic inside a library. It also explains what prediction means, how prediction is used in regression, how coefficient of correlation connects to linear relationships, and how the same idea appears in classification.

First principlesSlopeInterceptPredictionCorrelationClassificationPython editor

Lesson Roadmap

  1. Prediction in machine learning
  2. Linear regression from first principles
  3. Error and loss
  4. Closed-form slope and intercept
  5. Coefficient of correlation
  6. Manual Python implementation
  7. Prediction in classification
  8. How classification uses prediction differently

1. What is Prediction?

In machine learning, prediction means using a learned pattern to estimate an output for a new input.

Example: if a model has learned the relationship between study hours and marks, then for a new student who studies 6 hours, it can predict the marks that student may score.

Simple idea:

A prediction is the model's best guess based on the pattern it learned from past data.

Regression prediction

The output is a number. Example: marks, price, temperature, salary.

Classification prediction

The output is a category or class. Example: spam or not spam, pass or fail.

Common idea

Both use input data and learned patterns. The difference is the type of output.

2. Linear Regression from First Principles

Linear regression assumes that the relationship between input and output can be represented by a straight line.

y = mx + c

Here, x is the input, y is the predicted output, m is the slope, and c is the intercept.

The central question is: how do we find the best values of m and c from data?

3. Meaning of Slope

The slope tells us how much the output changes when the input increases by 1 unit.

If slope is positive, the output increases as input increases. If slope is negative, the output decreases as input increases.

4. Meaning of Intercept

The intercept tells us the predicted output when the input is 0.

It is the point where the line cuts the vertical axis.

5. Training Data Example

Study Hours (x) Marks (y)
1 20
2 30
3 40
4 50
5 60

The goal is to find a line that comes as close as possible to all these points.

6. What Does Best Line Mean?

The best line is the line whose predictions are closest to the actual values.

For each point, the model predicts a value. The difference between actual and predicted values is called the error.

error = actual - predicted

We want these errors to be as small as possible for all training points.

7. Why Square the Error?

Positive and negative errors can cancel each other. Squaring avoids that.

Squaring also gives more penalty to large mistakes.

squared error = (actual - predicted) ** 2

8. Loss Function

In linear regression, we often minimize the sum of squared errors or the mean squared error.

MSE = average of (actual - predicted)^2

The lower the loss, the better the line fits the data.

9. First-Principles Formula for Slope and Intercept

For simple linear regression with one input variable, the slope and intercept can be calculated directly.

m = Σ((x - mean_x)(y - mean_y)) / Σ((x - mean_x)^2) c = mean_y - m * mean_x

These formulas give the best-fit line for one-variable linear regression.

10. Coefficient of Correlation

The coefficient of correlation measures the strength and direction of the linear relationship between two variables.

r = Σ((x - mean_x)(y - mean_y)) / √(Σ((x - mean_x)^2) × Σ((y - mean_y)^2))

The value of r lies between minus 1 and plus 1.

  • r = 1 means perfect positive linear relationship
  • r = -1 means perfect negative linear relationship
  • r = 0 means no linear relationship
Important idea:

Correlation does not directly give the prediction line, but it tells us how strongly the variables are linearly related.

11. Interpreting Correlation

Value of r Interpretation
Near +1 Strong positive relationship
Near -1 Strong negative relationship
Near 0 Weak or no linear relationship

12. Correlation and Regression

Regression gives us a line for prediction.

Correlation tells us how strong the linear relationship is.

If correlation is strong and positive, the regression line often has a positive slope. If correlation is strong and negative, the regression line often has a negative slope.

13. Manual Correlation Calculation in Python

x = [1, 2, 3, 4, 5] y = [20, 30, 40, 50, 60] mean_x = sum(x) / len(x) mean_y = sum(y) / len(y) num = 0 den_x = 0 den_y = 0 for i in range(len(x)): num += (x[i] - mean_x) * (y[i] - mean_y) den_x += (x[i] - mean_x) ** 2 den_y += (y[i] - mean_y) ** 2 r = num / ((den_x * den_y) ** 0.5) print("Coefficient of correlation:", r)

This computes correlation directly from first principles.

14. Manual Python Implementation from First Principles

x = [1, 2, 3, 4, 5] y = [20, 30, 40, 50, 60] mean_x = sum(x) / len(x) mean_y = sum(y) / len(y) numerator = 0 denominator = 0 for i in range(len(x)): numerator += (x[i] - mean_x) * (y[i] - mean_y) denominator += (x[i] - mean_x) ** 2 m = numerator / denominator c = mean_y - m * mean_x print("Slope:", m) print("Intercept:", c)

This code calculates the line without using scikit-learn.

15. Manual Prediction Using the Learned Line

new_x = 6 predicted_y = m * new_x + c print("Prediction for x = 6:", predicted_y)

Once we know the slope and intercept, prediction is easy. We simply put the new input value into the line equation.

16. Complete First-Principles Program

x = [1, 2, 3, 4, 5] y = [20, 30, 40, 50, 60] mean_x = sum(x) / len(x) mean_y = sum(y) / len(y) numerator = 0 denominator = 0 for i in range(len(x)): numerator += (x[i] - mean_x) * (y[i] - mean_y) denominator += (x[i] - mean_x) ** 2 m = numerator / denominator c = mean_y - m * mean_x num = 0 den_x = 0 den_y = 0 for i in range(len(x)): num += (x[i] - mean_x) * (y[i] - mean_y) den_x += (x[i] - mean_x) ** 2 den_y += (y[i] - mean_y) ** 2 r = num / ((den_x * den_y) ** 0.5) print("Slope:", m) print("Intercept:", c) print("Correlation:", r) for value in x: prediction = m * value + c print("x =", value, "prediction =", prediction) new_x = 6 print("Prediction for x = 6:", m * new_x + c)

17. What This Program Teaches

  • How averages are used
  • How slope is computed from data
  • How intercept is calculated
  • How prediction is produced
  • How coefficient of correlation is computed
  • How a line summarizes the trend in the dataset

18. Same Idea with NumPy

import numpy as np x = np.array([1, 2, 3, 4, 5], dtype=float) y = np.array([20, 30, 40, 50, 60], dtype=float) mean_x = np.mean(x) mean_y = np.mean(y) m = np.sum((x - mean_x) * (y - mean_y)) / np.sum((x - mean_x) ** 2) c = mean_y - m * mean_x r = np.sum((x - mean_x) * (y - mean_y)) / np.sqrt(np.sum((x - mean_x) ** 2) * np.sum((y - mean_y) ** 2)) print("Slope:", m) print("Intercept:", c) print("Correlation:", r) new_x = 6 print("Prediction:", m * new_x + c)

NumPy makes the implementation shorter and cleaner.

19. Comparison with scikit-learn

import numpy as np from sklearn.linear_model import LinearRegression x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1) y = np.array([20, 30, 40, 50, 60]) model = LinearRegression() model.fit(x, y) print("Slope:", model.coef_[0]) print("Intercept:", model.intercept_) print("Prediction for 6:", model.predict([[6]])[0])

The first-principles version teaches the logic. The scikit-learn version is practical for real projects.

20. What Prediction Means in Classification

In classification, prediction still means the model gives an output for a new input, but the output is not usually a continuous number like marks or price.

Instead, the output is a class, category, or label such as:

  • Spam or not spam
  • Pass or fail
  • Cat or dog
  • Positive or negative review

21. Prediction in Regression

Regression predicts a real number.

Predicted marks = 68.5

22. Prediction in Classification

Classification predicts a label or class.

Predicted class = "Pass"

23. How Classification Uses Prediction

Classification models often first produce a score or probability and then convert it into a class label.

Example:

Probability of spam = 0.92 Final class = Spam

So classification also uses prediction, but the predicted quantity is often interpreted differently.

Regression

Predicts a continuous value like 72.3 or 145000.

Classification

Predicts a class label like pass, fail, spam, or not spam.

Shared idea

Both take input features and generate a prediction from learned patterns.

24. Simple Classification Example

Suppose we want to classify whether a student will pass or fail based on study hours.

Study Hours Class
1 Fail
2 Fail
4 Pass
5 Pass

A classification model would learn from this data and then predict a class for a new student.

25. Important Difference

Regression asks:

What number should I predict?

Classification asks:

Which category should I assign?

26. Using Our Python Editor

Use the embedded Programmer’s Picnic Python editor below to run the first-principles regression code, modify datasets, and experiment with predictions.

Embedded Python EditorProgrammer's Picnic
Open in New Tab

Practice Task 1

Change the dataset and compute slope and intercept manually.

Practice Task 2

Calculate coefficient of correlation manually and explain the result.

Practice Task 3

Write a paragraph explaining the difference between regression prediction and classification prediction.

27. Practice Questions

  1. What does a linear regression model try to learn?
  2. What is the meaning of slope?
  3. What is the meaning of intercept?
  4. Why do we square errors?
  5. How do we calculate prediction after learning slope and intercept?
  6. What is the coefficient of correlation?
  7. What does a correlation near plus one mean?
  8. What is the difference between regression prediction and classification prediction?
  9. Why is first-principles implementation useful for learning?

28. Speak Paragraphs

Hidden narration paragraphs with IDs are included below for speech and guided reading systems.

29. Conclusion

Linear regression from first principles teaches the heart of machine learning: learn a pattern from data, measure error, and use the learned pattern to make predictions.

Coefficient of correlation helps us understand how strongly two variables are linearly related.

Classification also uses prediction, but instead of returning a continuous number, it returns a class or label, often after computing a score or probability.