0. What Are We Trying to Do?
In real life, data rarely comes as a perfect formula. We may have values like this:
x: 1, 2, 3, 4, 5
y: 2, 4, 5, 4, 5
We want to ask:
This is called curve fitting.
A constant line.
A straight line.
A parabola.
A higher-degree polynomial curve.
The goal is not always to pass exactly through every point. The goal is usually to find the best approximate curve.
1. Data Points
Suppose we have data points:
(x1, y1), (x2, y2), (x3, y3), ..., (xn, yn)
Example:
(1, 2), (2, 4), (3, 5), (4, 4), (5, 5)
| Symbol | Meaning |
|---|---|
| x | Input |
| y | Actual output |
| ŷ | Predicted output, read as y hat |
2. Power 0 Model: Constant Fitting
The simplest possible model ignores x completely.
This is called a degree 0 polynomial or power 0 model.
Here, a0 is the constant prediction.
Why Is It Called Power 0?
Because:
So we can write:
a0 = a0 × 1
a0 = a0 × x⁰
Therefore:
Since x⁰ is always 1, this model is constant.
Best Constant Model
Suppose actual values are:
y = 2, 4, 5, 4, 5
The best constant, under squared error, is the mean of y.
y = 2, 4, 5, 4, 5
sum = 20
n = 5
mean = 20 / 5 = 4
So the best constant model is:
Error in Power 0 Model
| x | Actual y | Predicted y | Error |
|---|---|---|---|
| 1 | 2 | 4 | -2 |
| 2 | 4 | 4 | 0 |
| 3 | 5 | 4 | 1 |
| 4 | 4 | 4 | 0 |
| 5 | 5 | 4 | 1 |
Squared errors = 4, 0, 1, 0, 1
MSE = (4 + 0 + 1 + 0 + 1) / 5
MSE = 6 / 5
MSE = 1.2
3. Linear Model: Degree 1 Fitting
Now we allow the prediction to depend on x.
This is the same as:
ŷ = intercept + slope × x
| Coefficient | Meaning |
|---|---|
| a0 | Intercept |
| a1 | Slope |
Meaning of Slope
Suppose:
ŷ = 2 + 3x
Then:
a0 = 2
a1 = 3
| x | Prediction |
|---|---|
| 0 | 2 |
| 1 | 5 |
| 2 | 8 |
| 3 | 11 |
Every time x increases by 1, prediction increases by 3.
Error in Linear Regression
For each point:
actual y = yi
predicted y = ŷi
The error is:
Since:
The squared error is:
The total squared error is:
SSE means Sum of Squared Errors. The best line is the line that minimizes this quantity.
4. Derivation of Best Linear Fit
We want to minimize:
The unknowns are a0 and a1.
Derivative with Respect to a0
SSE = Σ(yi - a0 - a1xi)²
∂SSE/∂a0 = -2Σ(yi - a0 - a1xi)
Set it equal to zero:
-2Σ(yi - a0 - a1xi) = 0
Σ(yi - a0 - a1xi) = 0
Σyi - na0 - a1Σxi = 0
na0 + a1Σxi = Σyi
This is the first normal equation.
Derivative with Respect to a1
SSE = Σ(yi - a0 - a1xi)²
∂SSE/∂a1 = -2Σxi(yi - a0 - a1xi)
Set it equal to zero:
-2Σxi(yi - a0 - a1xi) = 0
Σxi(yi - a0 - a1xi) = 0
Σxiyi - a0Σxi - a1Σxi² = 0
a0Σxi + a1Σxi² = Σxiyi
This is the second normal equation.
The Standard Formulas
Solving the two normal equations gives:
5. Higher-Degree Curve Fitting
Linear regression is degree 1:
But sometimes data is curved.
x: 1, 2, 3, 4, 5
y: 1, 4, 9, 16, 25
This is not linear. It follows:
Degree 2 Polynomial
This is a parabola. It can bend once.
Degree 3 Polynomial
A cubic curve can bend more than a quadratic curve.
Degree d Polynomial
6. Polynomial Regression as a Matrix Problem
Suppose degree is 2:
For data points x1, x2, x3, ..., xn, we create a matrix:
X = [
[1, x1, x1²],
[1, x2, x2²],
[1, x3, x3²],
...
[1, xn, xn²]
]
The coefficient vector is:
a = [
a0,
a1,
a2
]
The prediction is:
For degree d:
X = [
[1, x1, x1², ..., x1ᵈ],
[1, x2, x2², ..., x2ᵈ],
[1, x3, x3², ..., x3ᵈ],
...
[1, xn, xn², ..., xnᵈ]
]
This matrix is called the design matrix.
Normal Equation
| Symbol | Meaning |
|---|---|
| X | Design matrix |
| XT | Transpose of X |
| y | Actual output vector |
| a | Coefficient vector |
7. Python Implementation
Dataset
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
Power 0 Fitting in Python
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
a0 = sum(y) / len(y)
print("Best constant a0:", a0)
predictions = [a0 for value in x]
print("Predictions:", predictions)
errors = [actual - predicted for actual, predicted in zip(y, predictions)]
squared_errors = [e ** 2 for e in errors]
mse = sum(squared_errors) / len(squared_errors)
print("Errors:", errors)
print("Mean Squared Error:", mse)
Expected output:
Best constant a0: 4.0
Predictions: [4.0, 4.0, 4.0, 4.0, 4.0]
Errors: [-2.0, 0.0, 1.0, 0.0, 1.0]
Mean Squared Error: 1.2
Linear Fitting from Scratch
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
n = len(x)
x_mean = sum(x) / n
y_mean = sum(y) / n
numerator = 0
denominator = 0
for xi, yi in zip(x, y):
numerator += (xi - x_mean) * (yi - y_mean)
denominator += (xi - x_mean) ** 2
a1 = numerator / denominator
a0 = y_mean - a1 * x_mean
print("Intercept a0:", a0)
print("Slope a1:", a1)
predictions = []
for xi in x:
y_hat = a0 + a1 * xi
predictions.append(y_hat)
print("Predictions:", predictions)
errors = [actual - predicted for actual, predicted in zip(y, predictions)]
mse = sum(e ** 2 for e in errors) / n
print("Errors:", errors)
print("Mean Squared Error:", mse)
Expected output:
Intercept a0: 2.8
Slope a1: 0.4
Predictions: [3.2, 3.6, 4.0, 4.4, 4.8]
Errors: [-1.2, 0.4, 1.0, -0.4, 0.2]
Mean Squared Error: 0.72
Install NumPy and Matplotlib
pip install numpy matplotlib
Create Design Matrix
def create_design_matrix(x, degree):
X = []
for xi in x:
row = []
for power in range(degree + 1):
row.append(xi ** power)
X.append(row)
return X
x = [1, 2, 3, 4, 5]
X = create_design_matrix(x, 2)
for row in X:
print(row)
Output:
[1, 1, 1]
[1, 2, 4]
[1, 3, 9]
[1, 4, 16]
[1, 5, 25]
Polynomial Fit Using Normal Equation
import numpy as np
def create_design_matrix(x, degree):
X = []
for xi in x:
row = []
for power in range(degree + 1):
row.append(xi ** power)
X.append(row)
return np.array(X, dtype=float)
def polynomial_fit(x, y, degree):
X = create_design_matrix(x, degree)
y = np.array(y, dtype=float).reshape(-1, 1)
XT = X.T
coefficients = np.linalg.inv(XT @ X) @ XT @ y
return coefficients.flatten()
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
degree = 2
coefficients = polynomial_fit(x, y, degree)
print("Coefficients:", coefficients)
Possible output:
Coefficients: [-0.8 3.02857143 -0.42857143]
So the fitted quadratic is approximately:
Prediction Function
def polynomial_predict(x, coefficients):
predictions = []
for xi in x:
y_hat = 0
for power, coefficient in enumerate(coefficients):
y_hat += coefficient * (xi ** power)
predictions.append(y_hat)
return predictions
Full Example for Degree 0, 1, 2 and 3
import numpy as np
def create_design_matrix(x, degree):
X = []
for xi in x:
row = []
for power in range(degree + 1):
row.append(xi ** power)
X.append(row)
return np.array(X, dtype=float)
def polynomial_fit(x, y, degree):
X = create_design_matrix(x, degree)
y = np.array(y, dtype=float).reshape(-1, 1)
XT = X.T
coefficients = np.linalg.inv(XT @ X) @ XT @ y
return coefficients.flatten()
def polynomial_predict(x, coefficients):
predictions = []
for xi in x:
y_hat = 0
for power, coefficient in enumerate(coefficients):
y_hat += coefficient * (xi ** power)
predictions.append(y_hat)
return predictions
def mean_squared_error(y_actual, y_predicted):
n = len(y_actual)
total = 0
for actual, predicted in zip(y_actual, y_predicted):
total += (actual - predicted) ** 2
return total / n
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
for degree in [0, 1, 2, 3]:
coefficients = polynomial_fit(x, y, degree)
predictions = polynomial_predict(x, coefficients)
mse = mean_squared_error(y, predictions)
print("Degree:", degree)
print("Coefficients:", coefficients)
print("Predictions:", predictions)
print("MSE:", mse)
print()
Safer NumPy Version
Using inverse directly can sometimes be numerically unstable:
np.linalg.inv(XT @ X) @ XT @ y
A safer method is np.linalg.lstsq().
import numpy as np
def create_design_matrix(x, degree):
X = []
for xi in x:
row = []
for power in range(degree + 1):
row.append(xi ** power)
X.append(row)
return np.array(X, dtype=float)
def polynomial_fit_lstsq(x, y, degree):
X = create_design_matrix(x, degree)
y = np.array(y, dtype=float)
coefficients, residuals, rank, singular_values = np.linalg.lstsq(X, y, rcond=None)
return coefficients
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
for degree in [0, 1, 2, 3]:
coefficients = polynomial_fit_lstsq(x, y, degree)
print("Degree:", degree)
print("Coefficients:", coefficients)
print()
8. Live Python Editor
Practice the code from this lesson directly in the embedded Python editor. Copy any example from above and run it here.
9. Visualization with Matplotlib
import numpy as np
import matplotlib.pyplot as plt
def create_design_matrix(x, degree):
X = []
for xi in x:
row = []
for power in range(degree + 1):
row.append(xi ** power)
X.append(row)
return np.array(X, dtype=float)
def polynomial_fit(x, y, degree):
X = create_design_matrix(x, degree)
y = np.array(y, dtype=float)
coefficients, residuals, rank, singular_values = np.linalg.lstsq(X, y, rcond=None)
return coefficients
def polynomial_predict(x, coefficients):
predictions = []
for xi in x:
y_hat = 0
for power, coefficient in enumerate(coefficients):
y_hat += coefficient * (xi ** power)
predictions.append(y_hat)
return np.array(predictions)
x = np.array([1, 2, 3, 4, 5], dtype=float)
y = np.array([2, 4, 5, 4, 5], dtype=float)
plt.scatter(x, y, label="Actual data")
x_smooth = np.linspace(min(x), max(x), 100)
for degree in [0, 1, 2, 3]:
coefficients = polynomial_fit(x, y, degree)
y_smooth = polynomial_predict(x_smooth, coefficients)
plt.plot(x_smooth, y_smooth, label=f"Degree {degree}")
plt.xlabel("x")
plt.ylabel("y")
plt.title("Power 0, Linear, and Higher-Degree Curve Fitting")
plt.legend()
plt.grid(True)
plt.show()
| Degree | Shape |
|---|---|
| 0 | Horizontal line |
| 1 | Straight line |
| 2 | Curved parabola |
| 3 | More flexible curve |
10. Important Warning: Higher Degree Is Not Always Better
A higher-degree polynomial can fit training data better. But that does not always mean it predicts future data better.
Usually too simple.
Simple trend.
Curved trend.
May twist wildly and memorize data.
A good model should perform well on new data, not only old data.
11. Full Combined Program
import numpy as np
import matplotlib.pyplot as plt
def create_design_matrix(x, degree):
"""
Create polynomial design matrix.
For degree 2:
x = 3 becomes [1, 3, 9]
"""
X = []
for xi in x:
row = []
for power in range(degree + 1):
row.append(xi ** power)
X.append(row)
return np.array(X, dtype=float)
def polynomial_fit(x, y, degree):
"""
Fit polynomial regression using least squares.
"""
X = create_design_matrix(x, degree)
y = np.array(y, dtype=float)
coefficients, residuals, rank, singular_values = np.linalg.lstsq(X, y, rcond=None)
return coefficients
def polynomial_predict(x, coefficients):
"""
Predict y values using polynomial coefficients.
"""
predictions = []
for xi in x:
y_hat = 0
for power, coefficient in enumerate(coefficients):
y_hat += coefficient * (xi ** power)
predictions.append(y_hat)
return np.array(predictions)
def mean_squared_error(y_actual, y_predicted):
"""
Calculate Mean Squared Error.
"""
y_actual = np.array(y_actual, dtype=float)
y_predicted = np.array(y_predicted, dtype=float)
return np.mean((y_actual - y_predicted) ** 2)
def print_polynomial(coefficients):
"""
Print polynomial in readable form.
"""
terms = []
for power, coefficient in enumerate(coefficients):
if power == 0:
terms.append(f"{coefficient:.4f}")
elif power == 1:
terms.append(f"{coefficient:.4f}x")
else:
terms.append(f"{coefficient:.4f}x^{power}")
return " + ".join(terms)
# Dataset
x = np.array([1, 2, 3, 4, 5], dtype=float)
y = np.array([2, 4, 5, 4, 5], dtype=float)
# Show results
for degree in [0, 1, 2, 3]:
coefficients = polynomial_fit(x, y, degree)
predictions = polynomial_predict(x, coefficients)
mse = mean_squared_error(y, predictions)
print("=" * 50)
print("Degree:", degree)
print("Polynomial:")
print("ŷ =", print_polynomial(coefficients))
print("Predictions:", predictions)
print("Mean Squared Error:", mse)
# Plot
plt.scatter(x, y, label="Actual data")
x_smooth = np.linspace(min(x), max(x), 200)
for degree in [0, 1, 2, 3]:
coefficients = polynomial_fit(x, y, degree)
y_smooth = polynomial_predict(x_smooth, coefficients)
plt.plot(x_smooth, y_smooth, label=f"Degree {degree}")
plt.xlabel("x")
plt.ylabel("y")
plt.title("Polynomial Curve Fitting")
plt.legend()
plt.grid(True)
plt.show()
12. What We Learned in This 0–10% Lesson
- Curve fitting means finding a mathematical pattern in data.
- Power 0 fitting gives a constant model:
ŷ = a0. - The best constant under squared error is the mean of y.
- Linear fitting gives:
ŷ = a0 + a1x. - The best line minimizes the sum of squared errors.
- The slope and intercept can be derived using calculus.
- Higher-degree polynomial fitting gives:
ŷ = a0 + a1x + a2x² + ... + adxᵈ. - Polynomial regression is still linear in coefficients.
- The design matrix stores powers of x.
- The normal equation is:
a = (XT X)^(-1) XT y. - In real Python work,
np.linalg.lstsq()is safer than directly using inverse. - Higher-degree models can overfit.
- The embedded Python editor lets learners practice directly inside the lesson page.
13. Exercises
Exercise 1
Given:
y = [10, 20, 30, 40]
Find the best power 0 model.
Exercise 2
Given:
x = [1, 2, 3, 4]
y = [2, 4, 6, 8]
Find the best linear model manually.
Exercise 3
Create the design matrix for:
x = [2, 3, 4]
degree = 3
Expected row form:
[1, x, x², x³]
Exercise 4
Modify the full program and try:
x = [1, 2, 3, 4, 5, 6]
y = [1, 4, 9, 16, 25, 36]
Fit degrees 0, 1, 2 and 3. Which degree fits best?
Exercise 5
Try a high degree such as:
degree = 10
Observe the curve. Is it learning the pattern, or is it memorizing the data?
14. Bridge to the Next Lesson
In the next 10–20% lesson, we should go deeper into: