Linear Regression from First Principles, Correlation, and Prediction in Classification

1. What is Prediction?

In machine learning, prediction means using a learned pattern to estimate an output for a new input.

Example: if a model has learned the relationship between study hours and marks, then for a new student who studies 6 hours, it can predict the marks that student may score.

Simple idea:

A prediction is the model's best guess based on the pattern it learned from past data.

Regression prediction

The output is a number. Example: marks, price, temperature, salary.

Classification prediction

The output is a category or class. Example: spam or not spam, pass or fail.

Common idea

Both use input data and learned patterns. The difference is the type of output.

2. Linear Regression from First Principles

Linear regression assumes that the relationship between input and output can be represented by a straight line.

y = mx + c

Here, x is the input, y is the predicted output, m is the slope, and c is the intercept.

The central question is: how do we find the best values of m and c from data?

3. Meaning of Slope

The slope tells us how much the output changes when the input increases by 1 unit.

If slope is positive, the output increases as input increases. If slope is negative, the output decreases as input increases.

4. Meaning of Intercept

The intercept tells us the predicted output when the input is 0.

It is the point where the line cuts the vertical axis.

5. Training Data Example

Study Hours (x)	Marks (y)
1	20
2	30
3	40
4	50
5	60

The goal is to find a line that comes as close as possible to all these points.

6. What Does Best Line Mean?

The best line is the line whose predictions are closest to the actual values.

For each point, the model predicts a value. The difference between actual and predicted values is called the error.

error = actual - predicted

We want these errors to be as small as possible for all training points.

7. Why Square the Error?

Positive and negative errors can cancel each other. Squaring avoids that.

Squaring also gives more penalty to large mistakes.

squared error = (actual - predicted) ** 2

8. Loss Function

In linear regression, we often minimize the sum of squared errors or the mean squared error.

MSE = average of (actual - predicted)^2

The lower the loss, the better the line fits the data.

9. First-Principles Formula for Slope and Intercept

For simple linear regression with one input variable, the slope and intercept can be calculated directly.

m = Σ((x - mean_x)(y - mean_y)) / Σ((x - mean_x)^2) c = mean_y - m * mean_x

These formulas give the best-fit line for one-variable linear regression.

10. Coefficient of Correlation

The coefficient of correlation measures the strength and direction of the linear relationship between two variables.

r = Σ((x - mean_x)(y - mean_y)) / √(Σ((x - mean_x)^2) × Σ((y - mean_y)^2))

The value of r lies between minus 1 and plus 1.

r = 1 means perfect positive linear relationship
r = -1 means perfect negative linear relationship
r = 0 means no linear relationship

Important idea:

Correlation does not directly give the prediction line, but it tells us how strongly the variables are linearly related.

11. Interpreting Correlation

Value of r	Interpretation
Near +1	Strong positive relationship
Near -1	Strong negative relationship
Near 0	Weak or no linear relationship

12. Correlation and Regression

Regression gives us a line for prediction.

Correlation tells us how strong the linear relationship is.

If correlation is strong and positive, the regression line often has a positive slope. If correlation is strong and negative, the regression line often has a negative slope.

13. Manual Correlation Calculation in Python

x = [1, 2, 3, 4, 5] y = [20, 30, 40, 50, 60] mean_x = sum(x) / len(x) mean_y = sum(y) / len(y) num = 0 den_x = 0 den_y = 0 for i in range(len(x)): num += (x[i] - mean_x) * (y[i] - mean_y) den_x += (x[i] - mean_x) ** 2 den_y += (y[i] - mean_y) ** 2 r = num / ((den_x * den_y) ** 0.5) print("Coefficient of correlation:", r)

This computes correlation directly from first principles.

14. Manual Python Implementation from First Principles

x = [1, 2, 3, 4, 5] y = [20, 30, 40, 50, 60] mean_x = sum(x) / len(x) mean_y = sum(y) / len(y) numerator = 0 denominator = 0 for i in range(len(x)): numerator += (x[i] - mean_x) * (y[i] - mean_y) denominator += (x[i] - mean_x) ** 2 m = numerator / denominator c = mean_y - m * mean_x print("Slope:", m) print("Intercept:", c)

This code calculates the line without using scikit-learn.

15. Manual Prediction Using the Learned Line

new_x = 6 predicted_y = m * new_x + c print("Prediction for x = 6:", predicted_y)

Once we know the slope and intercept, prediction is easy. We simply put the new input value into the line equation.

16. Complete First-Principles Program

x = [1, 2, 3, 4, 5] y = [20, 30, 40, 50, 60] mean_x = sum(x) / len(x) mean_y = sum(y) / len(y) numerator = 0 denominator = 0 for i in range(len(x)): numerator += (x[i] - mean_x) * (y[i] - mean_y) denominator += (x[i] - mean_x) ** 2 m = numerator / denominator c = mean_y - m * mean_x num = 0 den_x = 0 den_y = 0 for i in range(len(x)): num += (x[i] - mean_x) * (y[i] - mean_y) den_x += (x[i] - mean_x) ** 2 den_y += (y[i] - mean_y) ** 2 r = num / ((den_x * den_y) ** 0.5) print("Slope:", m) print("Intercept:", c) print("Correlation:", r) for value in x: prediction = m * value + c print("x =", value, "prediction =", prediction) new_x = 6 print("Prediction for x = 6:", m * new_x + c)

17. What This Program Teaches

How averages are used
How slope is computed from data
How intercept is calculated
How prediction is produced
How coefficient of correlation is computed
How a line summarizes the trend in the dataset

18. Same Idea with NumPy

import numpy as np x = np.array([1, 2, 3, 4, 5], dtype=float) y = np.array([20, 30, 40, 50, 60], dtype=float) mean_x = np.mean(x) mean_y = np.mean(y) m = np.sum((x - mean_x) * (y - mean_y)) / np.sum((x - mean_x) ** 2) c = mean_y - m * mean_x r = np.sum((x - mean_x) * (y - mean_y)) / np.sqrt(np.sum((x - mean_x) ** 2) * np.sum((y - mean_y) ** 2)) print("Slope:", m) print("Intercept:", c) print("Correlation:", r) new_x = 6 print("Prediction:", m * new_x + c)

NumPy makes the implementation shorter and cleaner.

19. Comparison with scikit-learn

import numpy as np from sklearn.linear_model import LinearRegression x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1) y = np.array([20, 30, 40, 50, 60]) model = LinearRegression() model.fit(x, y) print("Slope:", model.coef_[0]) print("Intercept:", model.intercept_) print("Prediction for 6:", model.predict([[6]])[0])

The first-principles version teaches the logic. The scikit-learn version is practical for real projects.

20. What Prediction Means in Classification

In classification, prediction still means the model gives an output for a new input, but the output is not usually a continuous number like marks or price.

Instead, the output is a class, category, or label such as:

Spam or not spam
Pass or fail
Cat or dog
Positive or negative review

21. Prediction in Regression

Regression predicts a real number.

Predicted marks = 68.5

22. Prediction in Classification

Classification predicts a label or class.

Predicted class = "Pass"

23. How Classification Uses Prediction

Classification models often first produce a score or probability and then convert it into a class label.

Example:

Probability of spam = 0.92 Final class = Spam

So classification also uses prediction, but the predicted quantity is often interpreted differently.

Regression

Predicts a continuous value like 72.3 or 145000.

Classification

Predicts a class label like pass, fail, spam, or not spam.

Shared idea

Both take input features and generate a prediction from learned patterns.

24. Simple Classification Example

Suppose we want to classify whether a student will pass or fail based on study hours.

Study Hours	Class
1	Fail
2	Fail
4	Pass
5	Pass

A classification model would learn from this data and then predict a class for a new student.

25. Important Difference

Regression asks:

What number should I predict?

Classification asks:

Which category should I assign?

26. Using Our Python Editor

Use the embedded Programmer’s Picnic Python editor below to run the first-principles regression code, modify datasets, and experiment with predictions.

Practice Task 1

Change the dataset and compute slope and intercept manually.

Practice Task 2

Calculate coefficient of correlation manually and explain the result.

Practice Task 3

Write a paragraph explaining the difference between regression prediction and classification prediction.

27. Practice Questions

What does a linear regression model try to learn?
What is the meaning of slope?
What is the meaning of intercept?
Why do we square errors?
How do we calculate prediction after learning slope and intercept?
What is the coefficient of correlation?
What does a correlation near plus one mean?
What is the difference between regression prediction and classification prediction?
Why is first-principles implementation useful for learning?

28. Speak Paragraphs

Hidden narration paragraphs with IDs are included below for speech and guided reading systems.

Welcome to this lesson on linear regression from first principles.

In machine learning, prediction means using a learned pattern to estimate an output for a new input.

Regression prediction produces a numeric value, while classification prediction produces a category or class label.

Linear regression assumes that the relationship between input and output can be represented by a straight line.

The line equation is y equals m x plus c.

The slope tells us how much the output changes when the input changes by one unit.

The intercept tells us the predicted output when the input is zero.

The best line is the line whose predictions are closest to the actual values.

The difference between actual and predicted values is called the error.

We square errors so that positive and negative errors do not cancel out and large mistakes get more penalty.

A common loss function in linear regression is mean squared error.

For simple linear regression, the slope can be calculated directly from the data using the closed form formula.

The intercept can then be calculated from the mean of y minus slope times the mean of x.

Once slope and intercept are known, prediction is made by putting a new x value into the line equation.

The coefficient of correlation measures the strength and direction of the linear relationship between two variables.

The value of the coefficient of correlation always lies between minus one and plus one.

A correlation of plus one means a perfect positive linear relationship.

A correlation of minus one means a perfect negative linear relationship.

A correlation near zero means weak or no linear relationship.

Correlation tells us how strong the linear relationship is, while regression gives us a line for prediction.

NumPy can be used to calculate slope, intercept, prediction, and correlation more cleanly.

Scikit-learn is practical for real projects, but first principles help us understand the core logic.

In classification, prediction often starts as a score or probability and then becomes a class label.

Regression asks what number should I predict, while classification asks which category should I assign.

This lesson showed how to calculate linear regression and coefficient of correlation from first principles.

This also explained how prediction works in both regression and classification.

29. Conclusion

Linear regression from first principles teaches the heart of machine learning: learn a pattern from data, measure error, and use the learned pattern to make predictions.

Coefficient of correlation helps us understand how strongly two variables are linearly related.

Classification also uses prediction, but instead of returning a continuous number, it returns a class or label, often after computing a score or probability.

Linear Regression from First Principles

Lesson Roadmap