Linear Regression Book · 0–10%

Power 0, Linear and Higher-Degree Curve Fitting

A beginner-friendly 0 to infinity lesson on constant fitting, straight-line fitting, polynomial curve fitting, least squares derivation, Python implementation, and live practice.

Advertisement

0. What Are We Trying to Do?

In real life, data rarely comes as a perfect formula. We may have values like this:

x: 1, 2, 3, 4, 5
y: 2, 4, 5, 4, 5

We want to ask:

Can we find a mathematical pattern that approximately explains y using x?

This is called curve fitting.

Possible model 1:

A constant line.

Possible model 2:

A straight line.

Possible model 3:

A parabola.

Possible model 4:

A higher-degree polynomial curve.

The goal is not always to pass exactly through every point. The goal is usually to find the best approximate curve.

1. Data Points

Suppose we have data points:

(x1, y1), (x2, y2), (x3, y3), ..., (xn, yn)

Example:

(1, 2), (2, 4), (3, 5), (4, 4), (5, 5)
Symbol Meaning
x Input
y Actual output
ŷ Predicted output, read as y hat

2. Power 0 Model: Constant Fitting

The simplest possible model ignores x completely.

No matter what x is, I will always predict the same value.

This is called a degree 0 polynomial or power 0 model.

ŷ = a0

Here, a0 is the constant prediction.

Why Is It Called Power 0?

Because:

x0 = 1

So we can write:

a0 = a0 × 1
a0 = a0 × x⁰

Therefore:

ŷ = a0x0

Since x⁰ is always 1, this model is constant.

Best Constant Model

Suppose actual values are:

y = 2, 4, 5, 4, 5

The best constant, under squared error, is the mean of y.

a0 = ȳ = (1 / n) Σyi
y = 2, 4, 5, 4, 5
sum = 20
n = 5

mean = 20 / 5 = 4

So the best constant model is:

ŷ = 4

Error in Power 0 Model

x Actual y Predicted y Error
1 2 4 -2
2 4 4 0
3 5 4 1
4 4 4 0
5 5 4 1
Squared errors = 4, 0, 1, 0, 1

MSE = (4 + 0 + 1 + 0 + 1) / 5
MSE = 6 / 5
MSE = 1.2

3. Linear Model: Degree 1 Fitting

Now we allow the prediction to depend on x.

ŷ = a0 + a1x

This is the same as:

ŷ = intercept + slope × x
Coefficient Meaning
a0 Intercept
a1 Slope

Meaning of Slope

Suppose:

ŷ = 2 + 3x

Then:

a0 = 2
a1 = 3
x Prediction
0 2
1 5
2 8
3 11

Every time x increases by 1, prediction increases by 3.

slope = change in y / change in x

Error in Linear Regression

For each point:

actual y = yi
predicted y = ŷi

The error is:

ei = yi - ŷi

Since:

ŷi = a0 + a1xi

The squared error is:

ei2 = [yi - (a0 + a1xi)]2

The total squared error is:

SSE = Σ [yi - (a0 + a1xi)]2

SSE means Sum of Squared Errors. The best line is the line that minimizes this quantity.

Advertisement

4. Derivation of Best Linear Fit

We want to minimize:

SSE = Σ(yi - a0 - a1xi)2

The unknowns are a0 and a1.

Derivative with Respect to a0

SSE = Σ(yi - a0 - a1xi)²

∂SSE/∂a0 = -2Σ(yi - a0 - a1xi)

Set it equal to zero:

-2Σ(yi - a0 - a1xi) = 0

Σ(yi - a0 - a1xi) = 0

Σyi - na0 - a1Σxi = 0

na0 + a1Σxi = Σyi
na0 + a1Σxi = Σyi

This is the first normal equation.

Derivative with Respect to a1

SSE = Σ(yi - a0 - a1xi)²

∂SSE/∂a1 = -2Σxi(yi - a0 - a1xi)

Set it equal to zero:

-2Σxi(yi - a0 - a1xi) = 0

Σxi(yi - a0 - a1xi) = 0

Σxiyi - a0Σxi - a1Σxi² = 0

a0Σxi + a1Σxi² = Σxiyi
a0Σxi + a1Σxi2 = Σxiyi

This is the second normal equation.

The Standard Formulas

Solving the two normal equations gives:

a1 = Σ(xi - x̄)(yi - ȳ) / Σ(xi - x̄)2
a0 = ȳ - a1

5. Higher-Degree Curve Fitting

Linear regression is degree 1:

ŷ = a0 + a1x

But sometimes data is curved.

x: 1, 2, 3, 4, 5
y: 1, 4, 9, 16, 25

This is not linear. It follows:

y = x2

Degree 2 Polynomial

ŷ = a0 + a1x + a2x2

This is a parabola. It can bend once.

Degree 3 Polynomial

ŷ = a0 + a1x + a2x2 + a3x3

A cubic curve can bend more than a quadratic curve.

Degree d Polynomial

ŷ = a0 + a1x + a2x2 + ... + adxd
Polynomial regression is still linear in the coefficients a0, a1, a2, ..., ad. That is why it can be solved using linear regression methods.

6. Polynomial Regression as a Matrix Problem

Suppose degree is 2:

ŷ = a0 + a1x + a2x2

For data points x1, x2, x3, ..., xn, we create a matrix:

X = [
  [1, x1, x1²],
  [1, x2, x2²],
  [1, x3, x3²],
  ...
  [1, xn, xn²]
]

The coefficient vector is:

a = [
  a0,
  a1,
  a2
]

The prediction is:

ŷ = Xa

For degree d:

X = [
  [1, x1, x1², ..., x1ᵈ],
  [1, x2, x2², ..., x2ᵈ],
  [1, x3, x3², ..., x3ᵈ],
  ...
  [1, xn, xn², ..., xnᵈ]
]

This matrix is called the design matrix.

Normal Equation

a = (XTX)-1XTy
Symbol Meaning
X Design matrix
XT Transpose of X
y Actual output vector
a Coefficient vector

7. Python Implementation

Dataset

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

Power 0 Fitting in Python

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

a0 = sum(y) / len(y)

print("Best constant a0:", a0)

predictions = [a0 for value in x]

print("Predictions:", predictions)

errors = [actual - predicted for actual, predicted in zip(y, predictions)]
squared_errors = [e ** 2 for e in errors]
mse = sum(squared_errors) / len(squared_errors)

print("Errors:", errors)
print("Mean Squared Error:", mse)

Expected output:

Best constant a0: 4.0
Predictions: [4.0, 4.0, 4.0, 4.0, 4.0]
Errors: [-2.0, 0.0, 1.0, 0.0, 1.0]
Mean Squared Error: 1.2

Linear Fitting from Scratch

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

n = len(x)

x_mean = sum(x) / n
y_mean = sum(y) / n

numerator = 0
denominator = 0

for xi, yi in zip(x, y):
    numerator += (xi - x_mean) * (yi - y_mean)
    denominator += (xi - x_mean) ** 2

a1 = numerator / denominator
a0 = y_mean - a1 * x_mean

print("Intercept a0:", a0)
print("Slope a1:", a1)

predictions = []

for xi in x:
    y_hat = a0 + a1 * xi
    predictions.append(y_hat)

print("Predictions:", predictions)

errors = [actual - predicted for actual, predicted in zip(y, predictions)]
mse = sum(e ** 2 for e in errors) / n

print("Errors:", errors)
print("Mean Squared Error:", mse)

Expected output:

Intercept a0: 2.8
Slope a1: 0.4
Predictions: [3.2, 3.6, 4.0, 4.4, 4.8]
Errors: [-1.2, 0.4, 1.0, -0.4, 0.2]
Mean Squared Error: 0.72
The power 0 model had MSE = 1.2. The linear model has MSE = 0.72. So the line fits this data better than the constant model.

Install NumPy and Matplotlib

pip install numpy matplotlib

Create Design Matrix

def create_design_matrix(x, degree):
    X = []

    for xi in x:
        row = []
        for power in range(degree + 1):
            row.append(xi ** power)
        X.append(row)

    return X


x = [1, 2, 3, 4, 5]

X = create_design_matrix(x, 2)

for row in X:
    print(row)

Output:

[1, 1, 1]
[1, 2, 4]
[1, 3, 9]
[1, 4, 16]
[1, 5, 25]

Polynomial Fit Using Normal Equation

import numpy as np

def create_design_matrix(x, degree):
    X = []

    for xi in x:
        row = []
        for power in range(degree + 1):
            row.append(xi ** power)
        X.append(row)

    return np.array(X, dtype=float)


def polynomial_fit(x, y, degree):
    X = create_design_matrix(x, degree)
    y = np.array(y, dtype=float).reshape(-1, 1)

    XT = X.T
    coefficients = np.linalg.inv(XT @ X) @ XT @ y

    return coefficients.flatten()


x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

degree = 2

coefficients = polynomial_fit(x, y, degree)

print("Coefficients:", coefficients)

Possible output:

Coefficients: [-0.8  3.02857143 -0.42857143]

So the fitted quadratic is approximately:

ŷ = -0.8 + 3.02857143x - 0.42857143x2

Prediction Function

def polynomial_predict(x, coefficients):
    predictions = []

    for xi in x:
        y_hat = 0
        for power, coefficient in enumerate(coefficients):
            y_hat += coefficient * (xi ** power)
        predictions.append(y_hat)

    return predictions

Full Example for Degree 0, 1, 2 and 3

import numpy as np

def create_design_matrix(x, degree):
    X = []

    for xi in x:
        row = []
        for power in range(degree + 1):
            row.append(xi ** power)
        X.append(row)

    return np.array(X, dtype=float)


def polynomial_fit(x, y, degree):
    X = create_design_matrix(x, degree)
    y = np.array(y, dtype=float).reshape(-1, 1)

    XT = X.T
    coefficients = np.linalg.inv(XT @ X) @ XT @ y

    return coefficients.flatten()


def polynomial_predict(x, coefficients):
    predictions = []

    for xi in x:
        y_hat = 0
        for power, coefficient in enumerate(coefficients):
            y_hat += coefficient * (xi ** power)
        predictions.append(y_hat)

    return predictions


def mean_squared_error(y_actual, y_predicted):
    n = len(y_actual)
    total = 0

    for actual, predicted in zip(y_actual, y_predicted):
        total += (actual - predicted) ** 2

    return total / n


x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

for degree in [0, 1, 2, 3]:
    coefficients = polynomial_fit(x, y, degree)
    predictions = polynomial_predict(x, coefficients)
    mse = mean_squared_error(y, predictions)

    print("Degree:", degree)
    print("Coefficients:", coefficients)
    print("Predictions:", predictions)
    print("MSE:", mse)
    print()

Safer NumPy Version

Using inverse directly can sometimes be numerically unstable:

np.linalg.inv(XT @ X) @ XT @ y

A safer method is np.linalg.lstsq().

import numpy as np

def create_design_matrix(x, degree):
    X = []

    for xi in x:
        row = []
        for power in range(degree + 1):
            row.append(xi ** power)
        X.append(row)

    return np.array(X, dtype=float)


def polynomial_fit_lstsq(x, y, degree):
    X = create_design_matrix(x, degree)
    y = np.array(y, dtype=float)

    coefficients, residuals, rank, singular_values = np.linalg.lstsq(X, y, rcond=None)

    return coefficients


x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

for degree in [0, 1, 2, 3]:
    coefficients = polynomial_fit_lstsq(x, y, degree)
    print("Degree:", degree)
    print("Coefficients:", coefficients)
    print()

8. Live Python Editor

Practice the code from this lesson directly in the embedded Python editor. Copy any example from above and run it here.

Suggested practice: first run the power 0 model, then the linear model, then the polynomial regression program.

9. Visualization with Matplotlib

import numpy as np
import matplotlib.pyplot as plt

def create_design_matrix(x, degree):
    X = []

    for xi in x:
        row = []
        for power in range(degree + 1):
            row.append(xi ** power)
        X.append(row)

    return np.array(X, dtype=float)


def polynomial_fit(x, y, degree):
    X = create_design_matrix(x, degree)
    y = np.array(y, dtype=float)

    coefficients, residuals, rank, singular_values = np.linalg.lstsq(X, y, rcond=None)

    return coefficients


def polynomial_predict(x, coefficients):
    predictions = []

    for xi in x:
        y_hat = 0
        for power, coefficient in enumerate(coefficients):
            y_hat += coefficient * (xi ** power)
        predictions.append(y_hat)

    return np.array(predictions)


x = np.array([1, 2, 3, 4, 5], dtype=float)
y = np.array([2, 4, 5, 4, 5], dtype=float)

plt.scatter(x, y, label="Actual data")

x_smooth = np.linspace(min(x), max(x), 100)

for degree in [0, 1, 2, 3]:
    coefficients = polynomial_fit(x, y, degree)
    y_smooth = polynomial_predict(x_smooth, coefficients)

    plt.plot(x_smooth, y_smooth, label=f"Degree {degree}")

plt.xlabel("x")
plt.ylabel("y")
plt.title("Power 0, Linear, and Higher-Degree Curve Fitting")
plt.legend()
plt.grid(True)
plt.show()
Degree Shape
0 Horizontal line
1 Straight line
2 Curved parabola
3 More flexible curve

10. Important Warning: Higher Degree Is Not Always Better

A higher-degree polynomial can fit training data better. But that does not always mean it predicts future data better.

Degree 0

Usually too simple.

Degree 1

Simple trend.

Degree 2

Curved trend.

Degree 10

May twist wildly and memorize data.

This problem is called overfitting. Overfitting means the model memorizes the training data instead of learning the real pattern.

A good model should perform well on new data, not only old data.

11. Full Combined Program

import numpy as np
import matplotlib.pyplot as plt


def create_design_matrix(x, degree):
    """
    Create polynomial design matrix.

    For degree 2:
    x = 3 becomes [1, 3, 9]
    """
    X = []

    for xi in x:
        row = []
        for power in range(degree + 1):
            row.append(xi ** power)
        X.append(row)

    return np.array(X, dtype=float)


def polynomial_fit(x, y, degree):
    """
    Fit polynomial regression using least squares.
    """
    X = create_design_matrix(x, degree)
    y = np.array(y, dtype=float)

    coefficients, residuals, rank, singular_values = np.linalg.lstsq(X, y, rcond=None)

    return coefficients


def polynomial_predict(x, coefficients):
    """
    Predict y values using polynomial coefficients.
    """
    predictions = []

    for xi in x:
        y_hat = 0

        for power, coefficient in enumerate(coefficients):
            y_hat += coefficient * (xi ** power)

        predictions.append(y_hat)

    return np.array(predictions)


def mean_squared_error(y_actual, y_predicted):
    """
    Calculate Mean Squared Error.
    """
    y_actual = np.array(y_actual, dtype=float)
    y_predicted = np.array(y_predicted, dtype=float)

    return np.mean((y_actual - y_predicted) ** 2)


def print_polynomial(coefficients):
    """
    Print polynomial in readable form.
    """
    terms = []

    for power, coefficient in enumerate(coefficients):
        if power == 0:
            terms.append(f"{coefficient:.4f}")
        elif power == 1:
            terms.append(f"{coefficient:.4f}x")
        else:
            terms.append(f"{coefficient:.4f}x^{power}")

    return " + ".join(terms)


# Dataset
x = np.array([1, 2, 3, 4, 5], dtype=float)
y = np.array([2, 4, 5, 4, 5], dtype=float)

# Show results
for degree in [0, 1, 2, 3]:
    coefficients = polynomial_fit(x, y, degree)
    predictions = polynomial_predict(x, coefficients)
    mse = mean_squared_error(y, predictions)

    print("=" * 50)
    print("Degree:", degree)
    print("Polynomial:")
    print("ŷ =", print_polynomial(coefficients))
    print("Predictions:", predictions)
    print("Mean Squared Error:", mse)

# Plot
plt.scatter(x, y, label="Actual data")

x_smooth = np.linspace(min(x), max(x), 200)

for degree in [0, 1, 2, 3]:
    coefficients = polynomial_fit(x, y, degree)
    y_smooth = polynomial_predict(x_smooth, coefficients)

    plt.plot(x_smooth, y_smooth, label=f"Degree {degree}")

plt.xlabel("x")
plt.ylabel("y")
plt.title("Polynomial Curve Fitting")
plt.legend()
plt.grid(True)
plt.show()

12. What We Learned in This 0–10% Lesson

  1. Curve fitting means finding a mathematical pattern in data.
  2. Power 0 fitting gives a constant model: ŷ = a0.
  3. The best constant under squared error is the mean of y.
  4. Linear fitting gives: ŷ = a0 + a1x.
  5. The best line minimizes the sum of squared errors.
  6. The slope and intercept can be derived using calculus.
  7. Higher-degree polynomial fitting gives: ŷ = a0 + a1x + a2x² + ... + adxᵈ.
  8. Polynomial regression is still linear in coefficients.
  9. The design matrix stores powers of x.
  10. The normal equation is: a = (XT X)^(-1) XT y.
  11. In real Python work, np.linalg.lstsq() is safer than directly using inverse.
  12. Higher-degree models can overfit.
  13. The embedded Python editor lets learners practice directly inside the lesson page.
Advertisement

13. Exercises

Exercise 1

Given:

y = [10, 20, 30, 40]

Find the best power 0 model.

Exercise 2

Given:

x = [1, 2, 3, 4]
y = [2, 4, 6, 8]

Find the best linear model manually.

Exercise 3

Create the design matrix for:

x = [2, 3, 4]
degree = 3

Expected row form:

[1, x, x², x³]

Exercise 4

Modify the full program and try:

x = [1, 2, 3, 4, 5, 6]
y = [1, 4, 9, 16, 25, 36]

Fit degrees 0, 1, 2 and 3. Which degree fits best?

Exercise 5

Try a high degree such as:

degree = 10

Observe the curve. Is it learning the pattern, or is it memorizing the data?

14. Bridge to the Next Lesson

In the next 10–20% lesson, we should go deeper into:

Mean
Variance
Covariance
Correlation
Residuals
Least squares
This lesson introduced curve fitting as a practical idea. The next lesson will make the statistical foundation stronger.
0%