Power 0, Linear and Higher-Degree Curve Fitti

0. What Are We Trying to Do?

In real life, data rarely comes as a perfect formula. We may have values like this:

x: 1, 2, 3, 4, 5
y: 2, 4, 5, 4, 5

We want to ask:

Can we find a mathematical pattern that approximately explains y using x?

This is called curve fitting.

Possible model 1:

A constant line.

Possible model 2:

A straight line.

Possible model 3:

A parabola.

Possible model 4:

A higher-degree polynomial curve.

The goal is not always to pass exactly through every point. The goal is usually to find the best approximate curve.

1. Data Points

Suppose we have data points:

(x1, y1), (x2, y2), (x3, y3), ..., (xn, yn)

Example:

(1, 2), (2, 4), (3, 5), (4, 4), (5, 5)

Symbol	Meaning
x	Input
y	Actual output
ŷ	Predicted output, read as y hat

2. Power 0 Model: Constant Fitting

The simplest possible model ignores x completely.

No matter what x is, I will always predict the same value.

This is called a degree 0 polynomial or power 0 model.

ŷ = a₀

Here, a0 is the constant prediction.

Why Is It Called Power 0?

Because:

x⁰ = 1

So we can write:

a0 = a0 × 1
a0 = a0 × x⁰

Therefore:

ŷ = a₀x⁰

Since x⁰ is always 1, this model is constant.

Best Constant Model

Suppose actual values are:

y = 2, 4, 5, 4, 5

The best constant, under squared error, is the mean of y.

a₀ = ȳ = (1 / n) Σy_i

y = 2, 4, 5, 4, 5
sum = 20
n = 5

mean = 20 / 5 = 4

So the best constant model is:

ŷ = 4

Error in Power 0 Model

x	Actual y	Predicted y	Error
1	2	4	-2
2	4	4	0
3	5	4	1
4	4	4	0
5	5	4	1

Squared errors = 4, 0, 1, 0, 1

MSE = (4 + 0 + 1 + 0 + 1) / 5
MSE = 6 / 5
MSE = 1.2

3. Linear Model: Degree 1 Fitting

Now we allow the prediction to depend on x.

ŷ = a₀ + a₁x

This is the same as:

ŷ = intercept + slope × x

Coefficient	Meaning
a₀	Intercept
a₁	Slope

Meaning of Slope

Suppose:

ŷ = 2 + 3x

Then:

a0 = 2
a1 = 3

x	Prediction
0	2
1	5
2	8
3	11

Every time x increases by 1, prediction increases by 3.

slope = change in y / change in x

Error in Linear Regression

For each point:

actual y = yi
predicted y = ŷi

The error is:

e_i = y_i - ŷ_i

Since:

ŷ_i = a₀ + a₁x_i

The squared error is:

e_i² = [y_i - (a₀ + a₁x_i)]²

The total squared error is:

SSE = Σ [y_i - (a₀ + a₁x_i)]²

SSE means Sum of Squared Errors. The best line is the line that minimizes this quantity.

4. Derivation of Best Linear Fit

We want to minimize:

SSE = Σ(y_i - a₀ - a₁x_i)²

The unknowns are a0 and a1.

Derivative with Respect to a₀

SSE = Σ(yi - a0 - a1xi)²

∂SSE/∂a0 = -2Σ(yi - a0 - a1xi)

Set it equal to zero:

-2Σ(yi - a0 - a1xi) = 0

Σ(yi - a0 - a1xi) = 0

Σyi - na0 - a1Σxi = 0

na0 + a1Σxi = Σyi

na₀ + a₁Σx_i = Σy_i

This is the first normal equation.

Derivative with Respect to a₁

SSE = Σ(yi - a0 - a1xi)²

∂SSE/∂a1 = -2Σxi(yi - a0 - a1xi)

Set it equal to zero:

-2Σxi(yi - a0 - a1xi) = 0

Σxi(yi - a0 - a1xi) = 0

Σxiyi - a0Σxi - a1Σxi² = 0

a0Σxi + a1Σxi² = Σxiyi

a₀Σx_i + a₁Σx_i² = Σx_iy_i

This is the second normal equation.

The Standard Formulas

Solving the two normal equations gives:

a₁ = Σ(x_i - x̄)(y_i - ȳ) / Σ(x_i - x̄)²

a₀ = ȳ - a₁x̄

5. Higher-Degree Curve Fitting

Linear regression is degree 1:

ŷ = a₀ + a₁x

But sometimes data is curved.

x: 1, 2, 3, 4, 5
y: 1, 4, 9, 16, 25

This is not linear. It follows:

y = x²

Degree 2 Polynomial

ŷ = a₀ + a₁x + a₂x²

This is a parabola. It can bend once.

Degree 3 Polynomial

ŷ = a₀ + a₁x + a₂x² + a₃x³

A cubic curve can bend more than a quadratic curve.

Degree d Polynomial

ŷ = a₀ + a₁x + a₂x² + ... + a_dx^d

Polynomial regression is still linear in the coefficients a₀, a₁, a₂, ..., a_d. That is why it can be solved using linear regression methods.

6. Polynomial Regression as a Matrix Problem

Suppose degree is 2:

ŷ = a₀ + a₁x + a₂x²

For data points x1, x2, x3, ..., xn, we create a matrix:

X = [
  [1, x1, x1²],
  [1, x2, x2²],
  [1, x3, x3²],
  ...
  [1, xn, xn²]
]

The coefficient vector is:

a = [
  a0,
  a1,
  a2
]

The prediction is:

ŷ = Xa

For degree d:

X = [
  [1, x1, x1², ..., x1ᵈ],
  [1, x2, x2², ..., x2ᵈ],
  [1, x3, x3², ..., x3ᵈ],
  ...
  [1, xn, xn², ..., xnᵈ]
]

This matrix is called the design matrix.

Normal Equation

a = (X^TX)^-1X^Ty

Symbol	Meaning
X	Design matrix
X^T	Transpose of X
y	Actual output vector
a	Coefficient vector

7. Python Implementation

Dataset

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

Power 0 Fitting in Python

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

a0 = sum(y) / len(y)

print("Best constant a0:", a0)

predictions = [a0 for value in x]

print("Predictions:", predictions)

errors = [actual - predicted for actual, predicted in zip(y, predictions)]
squared_errors = [e ** 2 for e in errors]
mse = sum(squared_errors) / len(squared_errors)

print("Errors:", errors)
print("Mean Squared Error:", mse)

Expected output:

Best constant a0: 4.0
Predictions: [4.0, 4.0, 4.0, 4.0, 4.0]
Errors: [-2.0, 0.0, 1.0, 0.0, 1.0]
Mean Squared Error: 1.2

Linear Fitting from Scratch

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

n = len(x)

x_mean = sum(x) / n
y_mean = sum(y) / n

numerator = 0
denominator = 0

for xi, yi in zip(x, y):
    numerator += (xi - x_mean) * (yi - y_mean)
    denominator += (xi - x_mean) ** 2

a1 = numerator / denominator
a0 = y_mean - a1 * x_mean

print("Intercept a0:", a0)
print("Slope a1:", a1)

predictions = []

for xi in x:
    y_hat = a0 + a1 * xi
    predictions.append(y_hat)

print("Predictions:", predictions)

errors = [actual - predicted for actual, predicted in zip(y, predictions)]
mse = sum(e ** 2 for e in errors) / n

print("Errors:", errors)
print("Mean Squared Error:", mse)

Expected output:

Intercept a0: 2.8
Slope a1: 0.4
Predictions: [3.2, 3.6, 4.0, 4.4, 4.8]
Errors: [-1.2, 0.4, 1.0, -0.4, 0.2]
Mean Squared Error: 0.72

The power 0 model had MSE = 1.2. The linear model has MSE = 0.72. So the line fits this data better than the constant model.

Install NumPy and Matplotlib

pip install numpy matplotlib

Create Design Matrix

def create_design_matrix(x, degree):
    X = []

    for xi in x:
        row = []
        for power in range(degree + 1):
            row.append(xi ** power)
        X.append(row)

    return X


x = [1, 2, 3, 4, 5]

X = create_design_matrix(x, 2)

for row in X:
    print(row)

Output:

[1, 1, 1]
[1, 2, 4]
[1, 3, 9]
[1, 4, 16]
[1, 5, 25]

Polynomial Fit Using Normal Equation

import numpy as np

def create_design_matrix(x, degree):
    X = []

    for xi in x:
        row = []
        for power in range(degree + 1):
            row.append(xi ** power)
        X.append(row)

    return np.array(X, dtype=float)


def polynomial_fit(x, y, degree):
    X = create_design_matrix(x, degree)
    y = np.array(y, dtype=float).reshape(-1, 1)

    XT = X.T
    coefficients = np.linalg.inv(XT @ X) @ XT @ y

    return coefficients.flatten()


x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

degree = 2

coefficients = polynomial_fit(x, y, degree)

print("Coefficients:", coefficients)

Possible output:

Coefficients: [-0.8  3.02857143 -0.42857143]

So the fitted quadratic is approximately:

ŷ = -0.8 + 3.02857143x - 0.42857143x²

Prediction Function

def polynomial_predict(x, coefficients):
    predictions = []

    for xi in x:
        y_hat = 0
        for power, coefficient in enumerate(coefficients):
            y_hat += coefficient * (xi ** power)
        predictions.append(y_hat)

    return predictions

Full Example for Degree 0, 1, 2 and 3

import numpy as np

def create_design_matrix(x, degree):
    X = []

    for xi in x:
        row = []
        for power in range(degree + 1):
            row.append(xi ** power)
        X.append(row)

    return np.array(X, dtype=float)


def polynomial_fit(x, y, degree):
    X = create_design_matrix(x, degree)
    y = np.array(y, dtype=float).reshape(-1, 1)

    XT = X.T
    coefficients = np.linalg.inv(XT @ X) @ XT @ y

    return coefficients.flatten()


def polynomial_predict(x, coefficients):
    predictions = []

    for xi in x:
        y_hat = 0
        for power, coefficient in enumerate(coefficients):
            y_hat += coefficient * (xi ** power)
        predictions.append(y_hat)

    return predictions


def mean_squared_error(y_actual, y_predicted):
    n = len(y_actual)
    total = 0

    for actual, predicted in zip(y_actual, y_predicted):
        total += (actual - predicted) ** 2

    return total / n


x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

for degree in [0, 1, 2, 3]:
    coefficients = polynomial_fit(x, y, degree)
    predictions = polynomial_predict(x, coefficients)
    mse = mean_squared_error(y, predictions)

    print("Degree:", degree)
    print("Coefficients:", coefficients)
    print("Predictions:", predictions)
    print("MSE:", mse)
    print()

Safer NumPy Version

Using inverse directly can sometimes be numerically unstable:

np.linalg.inv(XT @ X) @ XT @ y

A safer method is np.linalg.lstsq().

import numpy as np

def create_design_matrix(x, degree):
    X = []

    for xi in x:
        row = []
        for power in range(degree + 1):
            row.append(xi ** power)
        X.append(row)

    return np.array(X, dtype=float)


def polynomial_fit_lstsq(x, y, degree):
    X = create_design_matrix(x, degree)
    y = np.array(y, dtype=float)

    coefficients, residuals, rank, singular_values = np.linalg.lstsq(X, y, rcond=None)

    return coefficients


x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

for degree in [0, 1, 2, 3]:
    coefficients = polynomial_fit_lstsq(x, y, degree)
    print("Degree:", degree)
    print("Coefficients:", coefficients)
    print()

8. Live Python Editor

Practice the code from this lesson directly in the embedded Python editor. Copy any example from above and run it here.

Suggested practice: first run the power 0 model, then the linear model, then the polynomial regression program.

9. Visualization with Matplotlib

import numpy as np
import matplotlib.pyplot as plt

def create_design_matrix(x, degree):
    X = []

    for xi in x:
        row = []
        for power in range(degree + 1):
            row.append(xi ** power)
        X.append(row)

    return np.array(X, dtype=float)


def polynomial_fit(x, y, degree):
    X = create_design_matrix(x, degree)
    y = np.array(y, dtype=float)

    coefficients, residuals, rank, singular_values = np.linalg.lstsq(X, y, rcond=None)

    return coefficients


def polynomial_predict(x, coefficients):
    predictions = []

    for xi in x:
        y_hat = 0
        for power, coefficient in enumerate(coefficients):
            y_hat += coefficient * (xi ** power)
        predictions.append(y_hat)

    return np.array(predictions)


x = np.array([1, 2, 3, 4, 5], dtype=float)
y = np.array([2, 4, 5, 4, 5], dtype=float)

plt.scatter(x, y, label="Actual data")

x_smooth = np.linspace(min(x), max(x), 100)

for degree in [0, 1, 2, 3]:
    coefficients = polynomial_fit(x, y, degree)
    y_smooth = polynomial_predict(x_smooth, coefficients)

    plt.plot(x_smooth, y_smooth, label=f"Degree {degree}")

plt.xlabel("x")
plt.ylabel("y")
plt.title("Power 0, Linear, and Higher-Degree Curve Fitting")
plt.legend()
plt.grid(True)
plt.show()

Degree	Shape
0	Horizontal line
1	Straight line
2	Curved parabola
3	More flexible curve

10. Important Warning: Higher Degree Is Not Always Better

A higher-degree polynomial can fit training data better. But that does not always mean it predicts future data better.

Degree 0

Usually too simple.

Degree 1

Simple trend.

Degree 2

Curved trend.

Degree 10

May twist wildly and memorize data.

This problem is called overfitting. Overfitting means the model memorizes the training data instead of learning the real pattern.

A good model should perform well on new data, not only old data.

11. Full Combined Program

import numpy as np
import matplotlib.pyplot as plt


def create_design_matrix(x, degree):
    """
    Create polynomial design matrix.

    For degree 2:
    x = 3 becomes [1, 3, 9]
    """
    X = []

    for xi in x:
        row = []
        for power in range(degree + 1):
            row.append(xi ** power)
        X.append(row)

    return np.array(X, dtype=float)


def polynomial_fit(x, y, degree):
    """
    Fit polynomial regression using least squares.
    """
    X = create_design_matrix(x, degree)
    y = np.array(y, dtype=float)

    coefficients, residuals, rank, singular_values = np.linalg.lstsq(X, y, rcond=None)

    return coefficients


def polynomial_predict(x, coefficients):
    """
    Predict y values using polynomial coefficients.
    """
    predictions = []

    for xi in x:
        y_hat = 0

        for power, coefficient in enumerate(coefficients):
            y_hat += coefficient * (xi ** power)

        predictions.append(y_hat)

    return np.array(predictions)


def mean_squared_error(y_actual, y_predicted):
    """
    Calculate Mean Squared Error.
    """
    y_actual = np.array(y_actual, dtype=float)
    y_predicted = np.array(y_predicted, dtype=float)

    return np.mean((y_actual - y_predicted) ** 2)


def print_polynomial(coefficients):
    """
    Print polynomial in readable form.
    """
    terms = []

    for power, coefficient in enumerate(coefficients):
        if power == 0:
            terms.append(f"{coefficient:.4f}")
        elif power == 1:
            terms.append(f"{coefficient:.4f}x")
        else:
            terms.append(f"{coefficient:.4f}x^{power}")

    return " + ".join(terms)


# Dataset
x = np.array([1, 2, 3, 4, 5], dtype=float)
y = np.array([2, 4, 5, 4, 5], dtype=float)

# Show results
for degree in [0, 1, 2, 3]:
    coefficients = polynomial_fit(x, y, degree)
    predictions = polynomial_predict(x, coefficients)
    mse = mean_squared_error(y, predictions)

    print("=" * 50)
    print("Degree:", degree)
    print("Polynomial:")
    print("ŷ =", print_polynomial(coefficients))
    print("Predictions:", predictions)
    print("Mean Squared Error:", mse)

# Plot
plt.scatter(x, y, label="Actual data")

x_smooth = np.linspace(min(x), max(x), 200)

for degree in [0, 1, 2, 3]:
    coefficients = polynomial_fit(x, y, degree)
    y_smooth = polynomial_predict(x_smooth, coefficients)

    plt.plot(x_smooth, y_smooth, label=f"Degree {degree}")

plt.xlabel("x")
plt.ylabel("y")
plt.title("Polynomial Curve Fitting")
plt.legend()
plt.grid(True)
plt.show()

12. What We Learned in This 0–10% Lesson

Curve fitting means finding a mathematical pattern in data.
Power 0 fitting gives a constant model: ŷ = a0.
The best constant under squared error is the mean of y.
Linear fitting gives: ŷ = a0 + a1x.
The best line minimizes the sum of squared errors.
The slope and intercept can be derived using calculus.
Higher-degree polynomial fitting gives: ŷ = a0 + a1x + a2x² + ... + adxᵈ.
Polynomial regression is still linear in coefficients.
The design matrix stores powers of x.
The normal equation is: a = (XT X)^(-1) XT y.
In real Python work, np.linalg.lstsq() is safer than directly using inverse.
Higher-degree models can overfit.
The embedded Python editor lets learners practice directly inside the lesson page.

13. Exercises

Exercise 1

Given:

y = [10, 20, 30, 40]

Find the best power 0 model.

Exercise 2

Given:

x = [1, 2, 3, 4]
y = [2, 4, 6, 8]

Find the best linear model manually.

Exercise 3

Create the design matrix for:

x = [2, 3, 4]
degree = 3

Expected row form:

[1, x, x², x³]

Exercise 4

Modify the full program and try:

x = [1, 2, 3, 4, 5, 6]
y = [1, 4, 9, 16, 25, 36]

Fit degrees 0, 1, 2 and 3. Which degree fits best?

Exercise 5

Try a high degree such as:

degree = 10

Observe the curve. Is it learning the pattern, or is it memorizing the data?

14. Bridge to the Next Lesson

In the next 10–20% lesson, we should go deeper into:

Mean

Variance

Covariance

Correlation

Residuals

Least squares

This lesson introduced curve fitting as a practical idea. The next lesson will make the statistical foundation stronger.

Power 0, Linear and Higher-Degree Curve Fitting

0. What Are We Trying to Do?

1. Data Points

2. Power 0 Model: Constant Fitting

Why Is It Called Power 0?

Best Constant Model

Error in Power 0 Model

3. Linear Model: Degree 1 Fitting

Meaning of Slope

Error in Linear Regression

4. Derivation of Best Linear Fit

Derivative with Respect to a0

Derivative with Respect to a1

The Standard Formulas

5. Higher-Degree Curve Fitting

Degree 2 Polynomial

Degree 3 Polynomial

Degree d Polynomial

6. Polynomial Regression as a Matrix Problem

Normal Equation

7. Python Implementation

Dataset

Power 0 Fitting in Python

Linear Fitting from Scratch

Install NumPy and Matplotlib

Create Design Matrix

Polynomial Fit Using Normal Equation

Prediction Function

Full Example for Degree 0, 1, 2 and 3

Safer NumPy Version

8. Live Python Editor

9. Visualization with Matplotlib

10. Important Warning: Higher Degree Is Not Always Better

11. Full Combined Program

12. What We Learned in This 0–10% Lesson

13. Exercises

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

14. Bridge to the Next Lesson

Derivative with Respect to a₀

Derivative with Respect to a₁