Programmer's Picnic Coefficient of Correlation + NumPy polyfit + poly1d
Statistics • Regression • NumPy • Python

Coefficient of Correlation + polyfit() + poly1d()

This is a complete premium lesson that connects three important ideas: how strongly two variables are related, how to fit a best-fit line or curve, and how to turn coefficients into a usable polynomial object.

Coefficient of correlation
Best-fit polynomial
Prediction with poly1d

1. Big Picture

Let us first understand how these topics connect to each other in one clean flow.

Step 1

Measure relationship

Before fitting a line, it is useful to understand whether x and y move together. That is where the coefficient of correlation comes in.

Step 2

Fit an equation

Once we see a relationship, we can use np.polyfit() to find the best-fit line or curve for the given data.

Step 3

Use the equation

We can then convert the coefficients into a polynomial object using np.poly1d() and use it for prediction.

Important: correlation and regression are related, but they are not the same thing. Correlation measures the strength and direction of linear relation. Regression gives an equation for prediction.

2. Coefficient of Correlation

This is usually Pearson’s correlation coefficient, written as r.

Meaning of r

Range of values

-1 ≤ r ≤ +1

r = +1 means perfect positive linear correlation.
r = -1 means perfect negative linear correlation.
r = 0 means no linear correlation.

Quick interpretation

How to read the sign

positive r → y tends to rise with x
negative r → y tends to fall with x

The closer the value is to +1 or -1, the stronger the linear relationship.

Formula

Pearson correlation coefficient formula

r = [ nΣxy - (Σx)(Σy) ] / √( [ nΣx² - (Σx)² ] [ nΣy² - (Σy)² ] )

Here:

  • n = number of observations
  • Σxy = sum of products of x and y
  • Σx = sum of x values
  • Σy = sum of y values
  • Σx² = sum of squares of x
  • Σy² = sum of squares of y
Manual example

Worked table

Take:

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

x y xy
1 2 1 4 2
2 4 4 16 8
3 5 9 25 15
4 4 16 16 16
5 5 25 25 25
15 20 55 86 66
Here, n = 5, Σx = 15, Σy = 20, Σx² = 55, Σy² = 86, and Σxy = 66.
r = [5(66) - (15)(20)] / √([5(55) - 15²][5(86) - 20²]) r = (330 - 300) / √((275 - 225)(430 - 400)) r = 30 / √(50 × 30) r = 30 / √1500 r ≈ 0.7746

So the coefficient of correlation is about 0.7746. This shows a fairly strong positive linear relationship.

Python method 1

Using np.corrcoef()

import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

r_matrix = np.corrcoef(x, y)
print(r_matrix)

r = np.corrcoef(x, y)[0, 1]
print("Correlation coefficient =", r)

np.corrcoef(x, y) returns a 2 × 2 correlation matrix.

  • [0,0] = correlation of x with x = 1
  • [1,1] = correlation of y with y = 1
  • [0,1] = correlation of x with y
  • [1,0] = correlation of y with x
Python method 2

Manual formula in Python

import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

n = len(x)

sum_x = np.sum(x)
sum_y = np.sum(y)
sum_xy = np.sum(x * y)
sum_x2 = np.sum(x ** 2)
sum_y2 = np.sum(y ** 2)

r = (n * sum_xy - sum_x * sum_y) / np.sqrt(
    (n * sum_x2 - sum_x ** 2) *
    (n * sum_y2 - sum_y ** 2)
)

print("Correlation coefficient =", r)

This second method shows the full logic very clearly and is excellent for learning.

3. NumPy polyfit()

This function finds the coefficients of the best-fit polynomial for the given data.

Basic syntax

Function form

np.polyfit(x, y, degree)

Here, x contains input values, y contains output values, and degree decides the type of polynomial.

Main return value

What it returns

Returns: NumPy array of coefficients

polyfit() returns the coefficients from highest power to lowest power.

Most important rule

How many coefficients are returned?

If the polynomial degree is n, then polyfit() returns n + 1 coefficients.

Degree Return structure Equation form
1 [m, b] y = mx + b
2 [a, b, c] y = ax² + bx + c
3 [a, b, c, d] y = ax³ + bx² + cx + d
Example 1 — Degree 1 return value in complete detail

Let us fit a straight line:

import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

coeff = np.polyfit(x, y, 1)
print(coeff)

A typical output may look like:

[0.6 2.2]

This means:

y = 0.6x + 2.2

First value

0.6 is the slope m. For every 1 unit increase in x, y tends to rise by about 0.6.

Second value

2.2 is the intercept b. It is the value of y when x = 0 on the fitted line.

Final fitted line

The best-fit line is y = 0.6x + 2.2.

m = coeff[0]
b = coeff[1]

y_pred = m * x + b
print(y_pred)
Example 2 — Degree 2 return value in complete detail

Now fit a quadratic curve:

import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 4, 9, 16, 25])

coeff = np.polyfit(x, y, 2)
print(coeff)

A typical output will be very close to:

[1.0 0.0 0.0]

This means:

y = 1.0x² + 0.0x + 0.0

So the return value means:

  • coeff[0] = coefficient of
  • coeff[1] = coefficient of x
  • coeff[2] = constant term

The biggest learning point is this: polyfit() always returns coefficients from highest degree to constant term.

Example 3 — Degree 3 return value in complete detail

For a cubic fit:

coeff = np.polyfit(x, y, 3)
print(coeff)

The returned array structure is:

[a, b, c, d]

This corresponds to:

y = ax³ + bx² + cx + d

Again, coefficient order is from highest power to lowest power.

Advanced note — What happens with full=True?

Sometimes you may see:

coeff, residuals, rank, singular_values, rcond = np.polyfit(x, y, 1, full=True)

print("coeff =", coeff)
print("residuals =", residuals)
print("rank =", rank)
print("singular_values =", singular_values)
print("rcond =", rcond)

Now the function returns more than just coefficients:

  • coeff → fitted coefficients
  • residuals → total squared fitting error
  • rank → effective rank of the internal matrix
  • singular_values → singular values of the scaled Vandermonde matrix
  • rcond → cutoff used for very small singular values

For most beginner and intermediate uses, the standard form without full=True is enough.

Important difference from correlation: correlation tells how strongly x and y are linearly related. polyfit() gives the coefficients of a fitted equation.

4. NumPy poly1d()

This function takes coefficients and returns a polynomial object.

Main idea

What it returns

Returns: a polynomial object

This is not just a plain list and not just a plain number. It is an object that represents a polynomial.

Why it is useful

What you can do with it

print it • call it • get coeffs • derivative • integral • roots

In simple words, poly1d turns coefficients into something that behaves like a mathematical function.

Example 1 — Building a polynomial object
import numpy as np

coeff = [2, 3, 4]
p = np.poly1d(coeff)

print(p)

This means the polynomial is:

p(x) = 2x² + 3x + 4

So poly1d() has taken the list [2, 3, 4] and created a polynomial object for it.

Example 2 — Evaluating the polynomial object like a function
import numpy as np

p = np.poly1d([2, 3, 4])

print(p(2))

This computes:

2(2²) + 3(2) + 4 = 8 + 6 + 4 = 18

So p(2) gives 18.

Example 3 — Using polyfit with poly1d
import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

coeff = np.polyfit(x, y, 1)
p = np.poly1d(coeff)

print("coefficients =", coeff)
print("polynomial object:")
print(p)
print("Prediction at x = 6:", p(6))

If coeff is something like [0.6, 2.2], then p represents:

p(x) = 0.6x + 2.2
Example 4 — Important attributes and methods of poly1d

Suppose:

import numpy as np

p = np.poly1d([2, -3, 5])

print("coeffs =", p.coeffs)
print("order =", p.order)
print("roots =", p.r)

dp = p.deriv()
print("derivative =", dp)

ip = p.integ()
print("integral =", ip)

This polynomial is:

p(x) = 2x² - 3x + 5
Expression Meaning
p.coeffs Returns the coefficients array
p.order Returns the degree of the polynomial
p.r Returns the roots
p.deriv() Returns the derivative polynomial
p.integ() Returns the integral polynomial
Best comparison

polyfit() vs poly1d()

Function Return value Main use
np.polyfit(x, y, degree) NumPy array of coefficients Find the best-fit polynomial pieces
np.poly1d(coeff) Polynomial object Use the polynomial easily for printing and prediction
In one line: polyfit() tells you what the polynomial is, and poly1d() gives you an object that lets you use that polynomial easily.

5. How Correlation, polyfit, and poly1d Work Together

This is the practical sequence you will use again and again.

First check the relationship

Use the coefficient of correlation to see whether x and y have a meaningful linear relationship.

Then fit the line or curve

Use np.polyfit() to get the coefficients of the best-fit polynomial.

Then build a usable model

Use np.poly1d() to turn the coefficients into a callable polynomial object.

Then predict

Use p(new_x) to estimate new y-values.

Combined example

All three in one program

import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

# 1. coefficient of correlation
r = np.corrcoef(x, y)[0, 1]
print("Correlation coefficient r =", r)

# 2. best-fit line coefficients
coeff = np.polyfit(x, y, 1)
print("polyfit coefficients =", coeff)

# 3. polynomial object
p = np.poly1d(coeff)
print("poly1d object:")
print(p)

# 4. prediction
print("Predicted y when x = 6:", p(6))

What this gives you:

  • r tells how strongly x and y are linearly related
  • coeff gives the fitted line coefficients
  • p turns those coefficients into a usable polynomial object
  • p(6) predicts a new y-value
Extra insight: in simple linear regression, is often used to describe how much of the variation is explained by the line. This is called the coefficient of determination.

6. Visual and Conceptual Warnings

These are very important for correct understanding.

Warning 1

Correlation is not prediction

A strong correlation does not by itself give a prediction formula. It only tells you the strength and direction of linear relation.

Warning 2

Regression is not the same as correlation

Regression produces a fitted equation. Correlation measures how strongly the variables move together.

Warning 3

Degree choice matters

Too low a degree may miss the real pattern. Too high a degree may overfit the noise.

Warning 4

Coefficient order matters

In polyfit(), coefficients always come from highest power to constant term.

7. MCQ Quiz

Check whether the combined lesson is clear.

1. What does the coefficient of correlation mainly measure?

2. What does np.polyfit(x, y, 1) return?

3. What does np.poly1d(coeff) return?

4. For degree 2, what is the returned coefficient structure?

5. Which expression gives the actual correlation coefficient from np.corrcoef(x, y)?

8. Mini Project — Study Hours, Marks, Correlation, and Prediction

This project combines the whole lesson into one practical program.

Project idea

Study hours vs marks

We will:

  • Measure the coefficient of correlation between study hours and marks
  • Fit a best-fit line using polyfit()
  • Create a polynomial object using poly1d()
  • Predict marks for a new number of study hours
import numpy as np
import matplotlib.pyplot as plt

hours = np.array([1, 2, 3, 4, 5, 6])
marks = np.array([38, 45, 57, 63, 74, 82])

# 1. correlation
r = np.corrcoef(hours, marks)[0, 1]
print("Correlation coefficient =", r)

# 2. best-fit line
coeff = np.polyfit(hours, marks, 1)
print("polyfit coefficients =", coeff)

# 3. polynomial object
model = np.poly1d(coeff)
print("Model:")
print(model)

# 4. prediction
predicted_for_7 = model(7)
print("Predicted marks for 7 hours =", predicted_for_7)

# 5. plot
x_new = np.linspace(hours.min(), 7, 100)
y_new = model(x_new)

plt.scatter(hours, marks, label="Actual data")
plt.plot(x_new, y_new, label="Best-fit line")
plt.xlabel("Study hours")
plt.ylabel("Marks")
plt.title("Study Hours vs Marks")
plt.legend()
plt.grid(True)
plt.show()
Best summary of this project: first measure the relationship, then fit the equation, then turn it into a usable model, then predict.

9. Final Summary

Let us compress the whole lesson into quick memory points.

Memory point 1

Coefficient of correlation

Measures the strength and direction of linear relation between x and y. It lies between -1 and +1.

Memory point 2

polyfit()

Returns the coefficients of the best-fit polynomial. If degree is n, it returns n + 1 coefficients.

Memory point 3

poly1d()

Returns a polynomial object that you can print, evaluate, differentiate, integrate, and inspect.

One-line master summary

Best one-line understanding

Correlation tells how strongly the variables are related, polyfit() gives the coefficients of the best-fit equation, and poly1d() turns that equation into an object you can use like a function.