Measure relationship
Before fitting a line, it is useful to understand whether x and y move together. That is where the coefficient of correlation comes in.
polyfit() +
poly1d()
This is a complete premium lesson that connects three important ideas: how strongly two variables are related, how to fit a best-fit line or curve, and how to turn coefficients into a usable polynomial object.
Let us first understand how these topics connect to each other in one clean flow.
Before fitting a line, it is useful to understand whether x and y move together. That is where the coefficient of correlation comes in.
Once we see a relationship, we can use
np.polyfit() to find the best-fit line or curve for
the given data.
We can then convert the coefficients into a polynomial object
using
np.poly1d() and use it for prediction.
This is usually Pearson’s correlation coefficient, written as
r.
r = +1 means perfect positive linear
correlation.
r = -1 means perfect negative linear
correlation.
r = 0 means no linear correlation.
The closer the value is to +1 or -1, the stronger the linear relationship.
Here:
n = number of observationsΣxy = sum of products of x and yΣx = sum of x valuesΣy = sum of y valuesΣx² = sum of squares of xΣy² = sum of squares of yTake:
x = [1, 2, 3, 4, 5]y = [2, 4, 5, 4, 5]
| x | y | x² | y² | xy |
|---|---|---|---|---|
| 1 | 2 | 1 | 4 | 2 |
| 2 | 4 | 4 | 16 | 8 |
| 3 | 5 | 9 | 25 | 15 |
| 4 | 4 | 16 | 16 | 16 |
| 5 | 5 | 25 | 25 | 25 |
| 15 | 20 | 55 | 86 | 66 |
n = 5, Σx = 15,
Σy = 20, Σx² = 55,
Σy² = 86, and Σxy = 66.
So the coefficient of correlation is about 0.7746. This shows a fairly strong positive linear relationship.
np.corrcoef()import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])
r_matrix = np.corrcoef(x, y)
print(r_matrix)
r = np.corrcoef(x, y)[0, 1]
print("Correlation coefficient =", r)
np.corrcoef(x, y) returns a
2 × 2 correlation matrix.
[0,0] = correlation of x with x = 1[1,1] = correlation of y with y = 1[0,1] = correlation of x with y[1,0] = correlation of y with ximport numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])
n = len(x)
sum_x = np.sum(x)
sum_y = np.sum(y)
sum_xy = np.sum(x * y)
sum_x2 = np.sum(x ** 2)
sum_y2 = np.sum(y ** 2)
r = (n * sum_xy - sum_x * sum_y) / np.sqrt(
(n * sum_x2 - sum_x ** 2) *
(n * sum_y2 - sum_y ** 2)
)
print("Correlation coefficient =", r)
This second method shows the full logic very clearly and is excellent for learning.
polyfit()This function finds the coefficients of the best-fit polynomial for the given data.
Here, x contains input values,
y contains output values, and
degree decides the type of polynomial.
polyfit() returns the coefficients from
highest power to lowest power.
If the polynomial degree is n, then
polyfit() returns
n + 1 coefficients.
| Degree | Return structure | Equation form |
|---|---|---|
| 1 | [m, b] |
y = mx + b |
| 2 | [a, b, c] |
y = ax² + bx + c |
| 3 | [a, b, c, d] |
y = ax³ + bx² + cx + d |
Let us fit a straight line:
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])
coeff = np.polyfit(x, y, 1)
print(coeff)
A typical output may look like:
This means:
0.6 is the slope m. For every 1
unit increase in x, y tends to rise by about 0.6.
2.2 is the intercept b. It is
the value of y when x = 0 on the fitted line.
The best-fit line is y = 0.6x + 2.2.
m = coeff[0]
b = coeff[1]
y_pred = m * x + b
print(y_pred)
Now fit a quadratic curve:
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 4, 9, 16, 25])
coeff = np.polyfit(x, y, 2)
print(coeff)
A typical output will be very close to:
This means:
So the return value means:
coeff[0] = coefficient of x²
coeff[1] = coefficient of xcoeff[2] = constant term
The biggest learning point is this:
polyfit() always returns coefficients from
highest degree to constant term.
For a cubic fit:
coeff = np.polyfit(x, y, 3)
print(coeff)
The returned array structure is:
This corresponds to:
Again, coefficient order is from highest power to lowest power.
Sometimes you may see:
coeff, residuals, rank, singular_values, rcond = np.polyfit(x, y, 1, full=True)
print("coeff =", coeff)
print("residuals =", residuals)
print("rank =", rank)
print("singular_values =", singular_values)
print("rcond =", rcond)
Now the function returns more than just coefficients:
coeff → fitted coefficientsresiduals → total squared fitting errorrank → effective rank of the internal matrix
singular_values → singular values of the scaled
Vandermonde matrix
rcond → cutoff used for very small singular
values
For most beginner and intermediate uses, the standard form
without full=True is enough.
polyfit() gives the coefficients of a fitted equation.
poly1d()This function takes coefficients and returns a polynomial object.
This is not just a plain list and not just a plain number. It is an object that represents a polynomial.
In simple words, poly1d turns coefficients into
something that behaves like a mathematical function.
import numpy as np
coeff = [2, 3, 4]
p = np.poly1d(coeff)
print(p)
This means the polynomial is:
So poly1d() has taken the list
[2, 3, 4] and created a polynomial object for it.
import numpy as np
p = np.poly1d([2, 3, 4])
print(p(2))
This computes:
So p(2) gives 18.
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])
coeff = np.polyfit(x, y, 1)
p = np.poly1d(coeff)
print("coefficients =", coeff)
print("polynomial object:")
print(p)
print("Prediction at x = 6:", p(6))
If coeff is something like
[0.6, 2.2], then p represents:
Suppose:
import numpy as np
p = np.poly1d([2, -3, 5])
print("coeffs =", p.coeffs)
print("order =", p.order)
print("roots =", p.r)
dp = p.deriv()
print("derivative =", dp)
ip = p.integ()
print("integral =", ip)
This polynomial is:
| Expression | Meaning |
|---|---|
p.coeffs |
Returns the coefficients array |
p.order |
Returns the degree of the polynomial |
p.r |
Returns the roots |
p.deriv() |
Returns the derivative polynomial |
p.integ() |
Returns the integral polynomial |
polyfit() vs poly1d()| Function | Return value | Main use |
|---|---|---|
np.polyfit(x, y, degree) |
NumPy array of coefficients | Find the best-fit polynomial pieces |
np.poly1d(coeff) |
Polynomial object | Use the polynomial easily for printing and prediction |
polyfit() tells you
what the polynomial is, and
poly1d() gives you an object that lets you
use that polynomial easily.
This is the practical sequence you will use again and again.
Use the coefficient of correlation to see whether x and y have a meaningful linear relationship.
Use np.polyfit() to get the coefficients of the
best-fit polynomial.
Use np.poly1d() to turn the coefficients into a
callable polynomial object.
Use p(new_x) to estimate new y-values.
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])
# 1. coefficient of correlation
r = np.corrcoef(x, y)[0, 1]
print("Correlation coefficient r =", r)
# 2. best-fit line coefficients
coeff = np.polyfit(x, y, 1)
print("polyfit coefficients =", coeff)
# 3. polynomial object
p = np.poly1d(coeff)
print("poly1d object:")
print(p)
# 4. prediction
print("Predicted y when x = 6:", p(6))
What this gives you:
r tells how strongly x and y are linearly related
coeff gives the fitted line coefficientsp turns those coefficients into a usable polynomial
object
p(6) predicts a new y-valuer² is often used to describe how much of the variation
is explained by the line. This is called the coefficient of
determination.
These are very important for correct understanding.
A strong correlation does not by itself give a prediction formula. It only tells you the strength and direction of linear relation.
Regression produces a fitted equation. Correlation measures how strongly the variables move together.
Too low a degree may miss the real pattern. Too high a degree may overfit the noise.
In polyfit(), coefficients always come from highest
power to constant term.
Check whether the combined lesson is clear.
This project combines the whole lesson into one practical program.
We will:
polyfit()poly1d()import numpy as np
import matplotlib.pyplot as plt
hours = np.array([1, 2, 3, 4, 5, 6])
marks = np.array([38, 45, 57, 63, 74, 82])
# 1. correlation
r = np.corrcoef(hours, marks)[0, 1]
print("Correlation coefficient =", r)
# 2. best-fit line
coeff = np.polyfit(hours, marks, 1)
print("polyfit coefficients =", coeff)
# 3. polynomial object
model = np.poly1d(coeff)
print("Model:")
print(model)
# 4. prediction
predicted_for_7 = model(7)
print("Predicted marks for 7 hours =", predicted_for_7)
# 5. plot
x_new = np.linspace(hours.min(), 7, 100)
y_new = model(x_new)
plt.scatter(hours, marks, label="Actual data")
plt.plot(x_new, y_new, label="Best-fit line")
plt.xlabel("Study hours")
plt.ylabel("Marks")
plt.title("Study Hours vs Marks")
plt.legend()
plt.grid(True)
plt.show()
Let us compress the whole lesson into quick memory points.
Measures the strength and direction of linear relation between x and y. It lies between -1 and +1.
polyfit()
Returns the coefficients of the best-fit polynomial. If degree
is n, it returns n + 1 coefficients.
poly1d()Returns a polynomial object that you can print, evaluate, differentiate, integrate, and inspect.
Correlation tells how strongly the variables are related,
polyfit() gives the coefficients of the best-fit
equation, and poly1d() turns that equation into an
object you can use like a function.