Overview
Statistics helps us understand data. When we have a list of numbers, we usually want to know the center, the most frequent value, the smallest value, the largest value, and how spread out the values are.
The six concepts
1. Mean
Mean is the average. Add all values and divide by the number of values.
2. Median
Median is the middle value after sorting the data. If there are two middle values, take their average.
3. Mode
Mode is the value that appears most often.
4. Max
Max is the largest value in the data.
5. Min
Min is the smallest value in the data.
6. Range
Range tells us how far the data spreads from minimum to maximum.
Work it out on our data
Mean
Add the values: 2 + 4 + 4 + 4 + 5 + 5 + 7 + 9 = 40. There are 8 values. So mean = 40 / 8 = 5.
Median
The sorted data is already 2, 4, 4, 4, 5, 5, 7, 9. There are 8 values, so the middle two are 4 and 5. Median = (4 + 5) / 2 = 4.5.
Mode
The value 4 appears 3 times. The value 5 appears 2 times. So mode = 4.
Max, Min and Range
Max = 9. Min = 2. Range = 9 - 2 = 7.
Direct Python code
This version shows the logic step by step. It is good for learning because we can see how each result is found.
def findModes(data):
frequencies={}
modes=[]
highestfrequency=0
for item in data:
frequency=frequencies.get(item,0)
frequency+=1
frequencies[item]=frequency
if frequency>highestfrequency:
highestfrequency=frequency
for key,value in frequencies.items():
if value==highestfrequency:
modes.append(key)
return modes,highestfrequency
def findMaxMinandRange(data):
max,min=data[0],data[0]
for item in data:
if item>max:
max=item
elif item<min:
min=item
return max,min,max-min
def findMedian(data):
n=len(data)
if n%2!=0:
"""
data=[1,2,3], median = 2
data=[1,2,3,4,5], median =3
"""
medianpos=(n+1)//2-1
median=data[medianpos]
return median
else:
"""
data=[1,2,3,4], median = (2 +3)/2=2.5
data=[1,2,3,4,5,6], median =(3+4)/2=3.5
"""
medianpos1=n//2-1
medianpos2=medianpos1+1
median1,median2=data[medianpos1], data[medianpos2]
median=(median1 + median2)/2
return median
def findMean(a):
total,length=0,0
for item in a:
total+=item
length+=1
if length==0:
return None
return total/length
data=[2, 4, 4, 4, 5, 5, 7, 9]
print(f"Input={data}",end=",")
mean=findMean(data)
print(f"Mean={mean}",end=",")
median=findMedian(data)
print(f"Median={median}")
max,min,range=findMaxMinandRange(data)
print(f"Max={max}, Min={min}, Range={range}",end=",")
modes,highestfrequency=findModes(data)
print(f"Modes={modes}, Highest Frequency={highestfrequency}",end=",")
NumPy code
This version is shorter. NumPy gives us ready-made tools for working with numerical data.
import numpy as np
from collections import Counter
data=[2, 4, 4, 4, 5, 5, 7, 9]
data = np.array(data)
mean=np.mean(data)
print(f"Input={data}",end=",")
print(f"Mean={mean}",end=",")
median=np.median(data)
print(f"Median={median}")
counts = Counter(data)
mode = counts.most_common(1)[0][0]
highestfrequency=counts.most_common(1)[0][1]
print(f"Counts:{counts}",end=",")
print(f"Mode={mode}, Highest Frequency={highestfrequency}",end=",")
min= np.min(data)
max= np.max(data)
range=max-min
print(f"Max={max}, Min={min}, Range={range}",end=",")
What each function or line is doing
findMean
This function loops through the list, adds every item to total, counts how many items are present, and returns total divided by length.
findMedian
This function checks whether the number of items is odd or even. If odd, it returns the middle item. If even, it returns the average of the two middle items.
findModes
This function creates a frequency dictionary. Then it finds the highest frequency and stores all values that match that highest frequency.
findMaxMinandRange
This function starts with the first item as both max and min, then updates them while checking every other item. Finally it returns max, min and max minus min.
NumPy functions
NumPy gives us np.mean, np.median, np.min and np.max. These are convenient and widely used in scientific Python.
Direct Python vs NumPy
| Point | Direct Python | NumPy |
|---|---|---|
| Learning value | Excellent for understanding the logic step by step. | Excellent for working faster with numerical data. |
| Code size | Longer | Shorter |
| Control | Very high, because we write the full logic ourselves. | High, but many operations are built in. |
| Use in AI-ML | Good for learning and interviews. | Very common in data science and ML workflows. |
Expected output idea
Input=[2, 4, 4, 4, 5, 5, 7, 9]
Mean=5.0
Median=4.5
Max=9, Min=2, Range=7
Modes=[4], Highest Frequency=3
Open in your Python editor
Use your editor link to run, change and test the code.
Practice MCQs
Try these questions to test your understanding.
Final takeaway
If you want to become strong in AI, ML, statistics and data programming, do not only use ready-made functions. Also understand the direct logic. That way, you will know both how the result is produced and how to work faster with tools like NumPy.
First understand it directly. Then use NumPy with confidence.