1. The Simple Idea
In simple linear regression, we use one input to predict one output.
But real life usually depends on many inputs. For example, marks may depend on:
2. Multiple Linear Regression Formula
Here:
Predicted output
Intercept or base value
Input features
Weights or coefficients
3. Example
Suppose we want to predict marks using hours studied, sleep, and practice.
Meaning:
- Each extra study hour adds about 5 marks.
- Each extra sleep hour adds about 2 marks.
- Each practice session adds about 3 marks.
4. Dataset
Hours Sleep Practice Marks
2 6 1 50
4 7 2 65
6 8 3 80
8 7 4 88
10 8 5 98
5. Python Implementation from Scratch
import numpy as np
# Inputs: Hours, Sleep, Practice
X = np.array([
[2, 6, 1],
[4, 7, 2],
[6, 8, 3],
[8, 7, 4],
[10, 8, 5]
], dtype=float)
# Output: Marks
y = np.array([50, 65, 80, 88, 98], dtype=float)
# Add intercept column of 1s
X_b = np.c_[np.ones((X.shape[0], 1)), X]
# Normal Equation:
# beta = (X.T X)^-1 X.T y
beta = np.linalg.inv(X_b.T @ X_b) @ X_b.T @ y
print("Intercept:", beta[0])
print("Coefficients:", beta[1:])
# Predict marks for:
# Hours = 7, Sleep = 8, Practice = 4
new_student = np.array([1, 7, 8, 4])
prediction = new_student @ beta
print("Predicted marks:", prediction)
6. Using NumPy Least Squares
import numpy as np
X = np.array([
[2, 6, 1],
[4, 7, 2],
[6, 8, 3],
[8, 7, 4],
[10, 8, 5]
], dtype=float)
y = np.array([50, 65, 80, 88, 98], dtype=float)
X_b = np.c_[np.ones((X.shape[0], 1)), X]
beta, residuals, rank, s = np.linalg.lstsq(X_b, y, rcond=None)
print("Intercept:", beta[0])
print("Coefficients:", beta[1:])
prediction = np.array([1, 7, 8, 4]) @ beta
print("Predicted marks:", prediction)
7. Using sklearn
import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([
[2, 6, 1],
[4, 7, 2],
[6, 8, 3],
[8, 7, 4],
[10, 8, 5]
], dtype=float)
y = np.array([50, 65, 80, 88, 98], dtype=float)
model = LinearRegression()
model.fit(X, y)
print("Intercept:", model.intercept_)
print("Coefficients:", model.coef_)
new_student = np.array([[7, 8, 4]])
prediction = model.predict(new_student)
print("Predicted marks:", prediction[0])
8. Charts with Matplotlib
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
X = np.array([
[2, 6, 1],
[4, 7, 2],
[6, 8, 3],
[8, 7, 4],
[10, 8, 5]
], dtype=float)
y = np.array([50, 65, 80, 88, 98], dtype=float)
model = LinearRegression()
model.fit(X, y)
predicted = model.predict(X)
plt.scatter(y, predicted)
plt.xlabel("Actual Marks")
plt.ylabel("Predicted Marks")
plt.title("Actual vs Predicted Marks")
plt.grid(True)
plt.show()
plt.plot(y, label="Actual Marks", marker="o")
plt.plot(predicted, label="Predicted Marks", marker="o")
plt.xlabel("Student Number")
plt.ylabel("Marks")
plt.title("Actual and Predicted Marks")
plt.legend()
plt.grid(True)
plt.show()
9. Important Concepts
Feature
An input column used for prediction.
Coefficient
The weight or importance of a feature.
Intercept
The base prediction when all inputs are zero.
Error
The difference between actual and predicted output.
10. Warning: More Inputs Are Not Always Better
- Overfitting: The model memorizes training data instead of learning general patterns.
- Multicollinearity: Two or more inputs carry almost the same information.
- Noise: Bad features confuse the model.
11. Embedded Python Editor
Try the code live in the Programmer’s Picnic Python editor.
12. MCQ Quiz
13. YouTube Closing Summary
Multiple Linear Regression is the natural upgrade of simple linear regression. Instead of using one input, we use many inputs to predict one output.
One input gives us a line. Two inputs can give us a plane. More inputs create a hyperplane. But the goal remains the same: find the best fitting model with minimum error.