Coefficient of Correlation + polyfit() + poly

1. Big Picture

Let us first understand how these topics connect to each other in one clean flow.

Step 1

Measure relationship

Before fitting a line, it is useful to understand whether x and y move together. That is where the coefficient of correlation comes in.

Step 2

Fit an equation

Once we see a relationship, we can use np.polyfit() to find the best-fit line or curve for the given data.

Step 3

Use the equation

We can then convert the coefficients into a polynomial object using np.poly1d() and use it for prediction.

Important: correlation and regression are related, but they are not the same thing. Correlation measures the strength and direction of linear relation. Regression gives an equation for prediction.

2. Coefficient of Correlation

This is usually Pearson’s correlation coefficient, written as r.

Meaning of r

Range of values

-1 ≤ r ≤ +1

r = +1 means perfect positive linear correlation.
r = -1 means perfect negative linear correlation.
r = 0 means no linear correlation.

Quick interpretation

How to read the sign

positive r → y tends to rise with x
negative r → y tends to fall with x

The closer the value is to +1 or -1, the stronger the linear relationship.

Formula

Pearson correlation coefficient formula

r = [ nΣxy - (Σx)(Σy) ] / √( [ nΣx² - (Σx)² ] [ nΣy² - (Σy)² ] )

Here:

n = number of observations
Σxy = sum of products of x and y
Σx = sum of x values
Σy = sum of y values
Σx² = sum of squares of x
Σy² = sum of squares of y

Manual example

Worked table

Take:

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

x	y	x²	y²	xy
1	2	1	4	2
2	4	4	16	8
3	5	9	25	15
4	4	16	16	16
5	5	25	25	25
15	20	55	86	66

              Here, n = 5, Σx = 15,
              Σy = 20, Σx² = 55,
              Σy² = 86, and Σxy = 66.
            

r = [5(66) - (15)(20)] / √([5(55) - 15²][5(86) - 20²]) r = (330 - 300) / √((275 - 225)(430 - 400)) r = 30 / √(50 × 30) r = 30 / √1500 r ≈ 0.7746

So the coefficient of correlation is about 0.7746. This shows a fairly strong positive linear relationship.

Python method 1

Using `np.corrcoef()`

import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

r_matrix = np.corrcoef(x, y)
print(r_matrix)

r = np.corrcoef(x, y)[0, 1]
print("Correlation coefficient =", r)

np.corrcoef(x, y) returns a 2 × 2 correlation matrix.

[0,0] = correlation of x with x = 1
[1,1] = correlation of y with y = 1
[0,1] = correlation of x with y
[1,0] = correlation of y with x

Python method 2

Manual formula in Python

import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

n = len(x)

sum_x = np.sum(x)
sum_y = np.sum(y)
sum_xy = np.sum(x * y)
sum_x2 = np.sum(x ** 2)
sum_y2 = np.sum(y ** 2)

r = (n * sum_xy - sum_x * sum_y) / np.sqrt(
    (n * sum_x2 - sum_x ** 2) *
    (n * sum_y2 - sum_y ** 2)
)

print("Correlation coefficient =", r)

This second method shows the full logic very clearly and is excellent for learning.

3. NumPy `polyfit()`

This function finds the coefficients of the best-fit polynomial for the given data.

Basic syntax

Function form

np.polyfit(x, y, degree)

Here, x contains input values, y contains output values, and degree decides the type of polynomial.

Main return value

What it returns

Returns: NumPy array of coefficients

polyfit() returns the coefficients from highest power to lowest power.

Most important rule

How many coefficients are returned?

If the polynomial degree is n, then polyfit() returns n + 1 coefficients.

Degree	Return structure	Equation form
1	`[m, b]`	`y = mx + b`
2	`[a, b, c]`	`y = ax² + bx + c`
3	`[a, b, c, d]`	`y = ax³ + bx² + cx + d`

Example 1 — Degree 1 return value in complete detail

Let us fit a straight line:

import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

coeff = np.polyfit(x, y, 1)
print(coeff)

A typical output may look like:

[0.6 2.2]

This means:

y = 0.6x + 2.2

First value

0.6 is the slope m. For every 1 unit increase in x, y tends to rise by about 0.6.

Second value

2.2 is the intercept b. It is the value of y when x = 0 on the fitted line.

Final fitted line

The best-fit line is y = 0.6x + 2.2.

m = coeff[0]
b = coeff[1]

y_pred = m * x + b
print(y_pred)

Example 2 — Degree 2 return value in complete detail

Now fit a quadratic curve:

import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 4, 9, 16, 25])

coeff = np.polyfit(x, y, 2)
print(coeff)

A typical output will be very close to:

[1.0 0.0 0.0]

This means:

y = 1.0x² + 0.0x + 0.0

So the return value means:

coeff[0] = coefficient of x²
coeff[1] = coefficient of x
coeff[2] = constant term

The biggest learning point is this: polyfit() always returns coefficients from highest degree to constant term.

Example 3 — Degree 3 return value in complete detail

For a cubic fit:

coeff = np.polyfit(x, y, 3)
print(coeff)

The returned array structure is:

[a, b, c, d]

This corresponds to:

y = ax³ + bx² + cx + d

Again, coefficient order is from highest power to lowest power.

Advanced note — What happens with full=True?

Sometimes you may see:

coeff, residuals, rank, singular_values, rcond = np.polyfit(x, y, 1, full=True)

print("coeff =", coeff)
print("residuals =", residuals)
print("rank =", rank)
print("singular_values =", singular_values)
print("rcond =", rcond)

Now the function returns more than just coefficients:

coeff → fitted coefficients
residuals → total squared fitting error
rank → effective rank of the internal matrix
singular_values → singular values of the scaled Vandermonde matrix
rcond → cutoff used for very small singular values

For most beginner and intermediate uses, the standard form without full=True is enough.

Important difference from correlation: correlation tells how strongly x and y are linearly related. polyfit() gives the coefficients of a fitted equation.

4. NumPy `poly1d()`

This function takes coefficients and returns a polynomial object.

Main idea

What it returns

Returns: a polynomial object

This is not just a plain list and not just a plain number. It is an object that represents a polynomial.

Why it is useful

What you can do with it

print it • call it • get coeffs • derivative • integral • roots

In simple words, poly1d turns coefficients into something that behaves like a mathematical function.

Example 1 — Building a polynomial object

import numpy as np

coeff = [2, 3, 4]
p = np.poly1d(coeff)

print(p)

This means the polynomial is:

p(x) = 2x² + 3x + 4

So poly1d() has taken the list [2, 3, 4] and created a polynomial object for it.

Example 2 — Evaluating the polynomial object like a function

import numpy as np

p = np.poly1d([2, 3, 4])

print(p(2))

This computes:

2(2²) + 3(2) + 4 = 8 + 6 + 4 = 18

So p(2) gives 18.

Example 3 — Using polyfit with poly1d

import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

coeff = np.polyfit(x, y, 1)
p = np.poly1d(coeff)

print("coefficients =", coeff)
print("polynomial object:")
print(p)
print("Prediction at x = 6:", p(6))

If coeff is something like [0.6, 2.2], then p represents:

p(x) = 0.6x + 2.2

Example 4 — Important attributes and methods of poly1d

Suppose:

import numpy as np

p = np.poly1d([2, -3, 5])

print("coeffs =", p.coeffs)
print("order =", p.order)
print("roots =", p.r)

dp = p.deriv()
print("derivative =", dp)

ip = p.integ()
print("integral =", ip)

This polynomial is:

p(x) = 2x² - 3x + 5

Expression	Meaning
`p.coeffs`	Returns the coefficients array
`p.order`	Returns the degree of the polynomial
`p.r`	Returns the roots
`p.deriv()`	Returns the derivative polynomial
`p.integ()`	Returns the integral polynomial

Best comparison

`polyfit()` vs `poly1d()`

Function	Return value	Main use
`np.polyfit(x, y, degree)`	NumPy array of coefficients	Find the best-fit polynomial pieces
`np.poly1d(coeff)`	Polynomial object	Use the polynomial easily for printing and prediction

              In one line: polyfit() tells you
              what the polynomial is, and
              poly1d() gives you an object that lets you
              use that polynomial easily.
            

5. How Correlation, polyfit, and poly1d Work Together

This is the practical sequence you will use again and again.

First check the relationship

Use the coefficient of correlation to see whether x and y have a meaningful linear relationship.

Then fit the line or curve

Use np.polyfit() to get the coefficients of the best-fit polynomial.

Then build a usable model

Use np.poly1d() to turn the coefficients into a callable polynomial object.

Then predict

Use p(new_x) to estimate new y-values.

Combined example

All three in one program

import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

# 1. coefficient of correlation
r = np.corrcoef(x, y)[0, 1]
print("Correlation coefficient r =", r)

# 2. best-fit line coefficients
coeff = np.polyfit(x, y, 1)
print("polyfit coefficients =", coeff)

# 3. polynomial object
p = np.poly1d(coeff)
print("poly1d object:")
print(p)

# 4. prediction
print("Predicted y when x = 6:", p(6))

What this gives you:

r tells how strongly x and y are linearly related
coeff gives the fitted line coefficients
p turns those coefficients into a usable polynomial object
p(6) predicts a new y-value

Extra insight: in simple linear regression, r² is often used to describe how much of the variation is explained by the line. This is called the coefficient of determination.

6. Visual and Conceptual Warnings

These are very important for correct understanding.

Warning 1

Correlation is not prediction

A strong correlation does not by itself give a prediction formula. It only tells you the strength and direction of linear relation.

Warning 2

Regression is not the same as correlation

Regression produces a fitted equation. Correlation measures how strongly the variables move together.

Warning 3

Degree choice matters

Too low a degree may miss the real pattern. Too high a degree may overfit the noise.

Warning 4

Coefficient order matters

In polyfit(), coefficients always come from highest power to constant term.

8. Mini Project — Study Hours, Marks, Correlation, and Prediction

This project combines the whole lesson into one practical program.

Project idea

Study hours vs marks

We will:

Measure the coefficient of correlation between study hours and marks
Fit a best-fit line using polyfit()
Create a polynomial object using poly1d()
Predict marks for a new number of study hours

import numpy as np
import matplotlib.pyplot as plt

hours = np.array([1, 2, 3, 4, 5, 6])
marks = np.array([38, 45, 57, 63, 74, 82])

# 1. correlation
r = np.corrcoef(hours, marks)[0, 1]
print("Correlation coefficient =", r)

# 2. best-fit line
coeff = np.polyfit(hours, marks, 1)
print("polyfit coefficients =", coeff)

# 3. polynomial object
model = np.poly1d(coeff)
print("Model:")
print(model)

# 4. prediction
predicted_for_7 = model(7)
print("Predicted marks for 7 hours =", predicted_for_7)

# 5. plot
x_new = np.linspace(hours.min(), 7, 100)
y_new = model(x_new)

plt.scatter(hours, marks, label="Actual data")
plt.plot(x_new, y_new, label="Best-fit line")
plt.xlabel("Study hours")
plt.ylabel("Marks")
plt.title("Study Hours vs Marks")
plt.legend()
plt.grid(True)
plt.show()

Best summary of this project: first measure the relationship, then fit the equation, then turn it into a usable model, then predict.

9. Final Summary

Let us compress the whole lesson into quick memory points.

Memory point 1