0) Quick Setup
You can practice this lesson using SQLite (no server needed). For MySQL/PostgreSQL, you’ll typically use SQLAlchemy + a DB driver.
pip install pandas sqlalchemy
- PostgreSQL: pip install psycopg2-binary (or psycopg)
- MySQL: pip install pymysql (or mysqlclient)
- MS SQL: pip install pyodbc
1) What “SQL in Pandas” means
Pandas can talk to SQL databases so you can:
- Read SQL tables or query results into a DataFrame.
- Write a DataFrame back into a SQL table.
- Combine SQL (best for big joins/filters) + Pandas (best for flexible transformations).
Write df.to_sql(name, con) — writes a table
2) SQLite + pd.read_sql (no server needed)
SQLite is perfect for learning because it’s a single file DB (e.g., shop.db).
2.1 Create a sample database + tables
import sqlite3
import pandas as pd
# Create (or open) a database file
con = sqlite3.connect("shop.db")
# Create tables + sample data
con.executescript("""
DROP TABLE IF EXISTS customers;
DROP TABLE IF EXISTS orders;
CREATE TABLE customers(
customer_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
city TEXT NOT NULL
);
CREATE TABLE orders(
order_id INTEGER PRIMARY KEY,
customer_id INTEGER NOT NULL,
amount REAL NOT NULL,
created_at TEXT NOT NULL,
FOREIGN KEY(customer_id) REFERENCES customers(customer_id)
);
INSERT INTO customers(name, city) VALUES
('Asha', 'Varanasi'),
('Ravi', 'Lucknow'),
('Neha', 'Delhi');
INSERT INTO orders(customer_id, amount, created_at) VALUES
(1, 399.0, '2026-01-10'),
(1, 1200.5,'2026-01-18'),
(2, 250.0, '2026-01-12'),
(3, 999.0, '2026-01-20');
""")
con.commit()
con.close()
2.2 Read a full table
import sqlite3
import pandas as pd
con = sqlite3.connect("shop.db")
df_customers = pd.read_sql("SELECT * FROM customers;", con)
df_orders = pd.read_sql("SELECT * FROM orders;", con)
con.close()
print(df_customers)
print(df_orders)
2.3 Read a query result (filter + sort)
import sqlite3
import pandas as pd
con = sqlite3.connect("shop.db")
sql = """
SELECT order_id, customer_id, amount, created_at
FROM orders
WHERE amount >= 500
ORDER BY amount DESC;
"""
df = pd.read_sql(sql, con)
con.close()
print(df)
3) SQLAlchemy engines (recommended approach)
SQLAlchemy standardizes connections across databases. With it, the same Pandas code can work for SQLite, PostgreSQL, MySQL, etc.
3.1 SQLite engine example
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("sqlite:///shop.db") # file-based sqlite database
df = pd.read_sql("SELECT * FROM customers;", engine)
print(df)
3.2 Connection strings (patterns)
# PostgreSQL (example)
# engine = create_engine("postgresql+psycopg2://USER:PASSWORD@HOST:5432/DBNAME")
# MySQL (example)
# engine = create_engine("mysql+pymysql://USER:PASSWORD@HOST:3306/DBNAME")
# SQL Server (example via pyodbc)
# engine = create_engine("mssql+pyodbc://USER:PASSWORD@HOST/DBNAME?driver=ODBC+Driver+17+for+SQL+Server")
4) Parameterized queries (safe SQL)
The biggest mistake beginners make is building SQL using string concatenation. That can lead to SQL Injection.
4.1 Safe parameters with SQLite (DB-API style)
import sqlite3
import pandas as pd
min_amount = 500
con = sqlite3.connect("shop.db")
sql = "SELECT * FROM orders WHERE amount >= ? ORDER BY amount DESC;"
df = pd.read_sql(sql, con, params=(min_amount,))
con.close()
print(df)
4.2 Safe parameters with SQLAlchemy (recommended)
import pandas as pd
from sqlalchemy import create_engine, text
engine = create_engine("sqlite:///shop.db")
sql = text("""
SELECT * FROM orders
WHERE amount >= :min_amount
ORDER BY amount DESC
""")
df = pd.read_sql(sql, engine, params={"min_amount": 500})
print(df)
5) Joins + GROUP BY + window functions
SQL shines here: joining big tables and aggregating quickly (especially when indexes exist).
5.1 Join orders with customer names
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("sqlite:///shop.db")
sql = """
SELECT
o.order_id,
c.name,
c.city,
o.amount,
o.created_at
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
ORDER BY o.created_at;
"""
df = pd.read_sql(sql, engine)
print(df)
5.2 GROUP BY (total spend per customer)
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("sqlite:///shop.db")
sql = """
SELECT
c.customer_id,
c.name,
SUM(o.amount) AS total_spend,
COUNT(*) AS orders_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.customer_id, c.name
ORDER BY total_spend DESC;
"""
df = pd.read_sql(sql, engine)
print(df)
5.3 Window function (running total per customer)
Window functions are powerful for rankings, running totals, and comparisons without losing row-level data.
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("sqlite:///shop.db")
sql = """
SELECT
o.order_id,
o.customer_id,
o.amount,
o.created_at,
SUM(o.amount) OVER (
PARTITION BY o.customer_id
ORDER BY o.created_at
) AS running_total
FROM orders o
ORDER BY o.customer_id, o.created_at;
"""
df = pd.read_sql(sql, engine)
print(df)
6) Writing DataFrames back to SQL (df.to_sql)
Use to_sql for loading transformed data into a database table.
6.1 Create a DataFrame and write it
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("sqlite:///shop.db")
df_new = pd.DataFrame({
"name": ["Irfan", "Meera"],
"city": ["Pune", "Varanasi"]
})
# if_exists:
# 'fail' (default) = error if table exists
# 'replace' = drop + recreate
# 'append' = add rows
df_new.to_sql("new_customers", engine, if_exists="replace", index=False)
# Verify
df_check = pd.read_sql("SELECT * FROM new_customers;", engine)
print(df_check)
6.2 Append data (common in pipelines)
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("sqlite:///shop.db")
df_more = pd.DataFrame({
"name": ["Sonia"],
"city": ["Delhi"]
})
df_more.to_sql("new_customers", engine, if_exists="append", index=False)
print(pd.read_sql("SELECT * FROM new_customers;", engine))
7) Chunking & reading large results safely
Loading millions of rows at once can crash your RAM. Use chunks to stream results and process gradually.
7.1 Read in chunks
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("sqlite:///shop.db")
sql = "SELECT * FROM orders;"
chunks = pd.read_sql(sql, engine, chunksize=2)
total = 0
rows = 0
for chunk in chunks:
rows += len(chunk)
total += chunk["amount"].sum()
print("rows:", rows)
print("total amount:", total)
7.2 Chunk + transform + write back (mini ETL)
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("sqlite:///shop.db")
# Example: add a derived column and store to a new table
chunks = pd.read_sql("SELECT * FROM orders;", engine, chunksize=2)
first = True
for chunk in chunks:
chunk["amount_with_gst"] = chunk["amount"] * 1.18
chunk.to_sql("orders_enriched", engine,
if_exists=("replace" if first else "append"),
index=False)
first = False
print(pd.read_sql("SELECT * FROM orders_enriched;", engine))
8) Performance tips (real-world)
- Use WHERE to reduce rows
- Use SELECT to reduce columns
- Use GROUP BY to aggregate early
-- example (run in SQL):
CREATE INDEX IF NOT EXISTS idx_orders_customer ON orders(customer_id);
CREATE INDEX IF NOT EXISTS idx_orders_created ON orders(created_at);
9) Common errors & quick fixes
“no such table: …”
“database is locked” (SQLite)
“ObjectNotExecutableError” (SQLAlchemy)
Dates come as strings
Writing to SQL creates an unwanted index column
10) Mini practice tasks (with answers)
Use the shop.db tables from earlier sections.
Task 1: Show total spend per city
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("sqlite:///shop.db")
sql = """
SELECT c.city, SUM(o.amount) AS total_spend, COUNT(*) AS orders_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.city
ORDER BY total_spend DESC;
"""
df = pd.read_sql(sql, engine)
print(df)
Task 2: Find customers who have no orders
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("sqlite:///shop.db")
sql = """
SELECT c.customer_id, c.name, c.city
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
WHERE o.order_id IS NULL;
"""
df = pd.read_sql(sql, engine)
print(df)
Task 3: Read orders from a date range using safe parameters
import pandas as pd
from sqlalchemy import create_engine, text
engine = create_engine("sqlite:///shop.db")
sql = text("""
SELECT *
FROM orders
WHERE created_at BETWEEN :start_date AND :end_date
ORDER BY created_at
""")
df = pd.read_sql(sql, engine, params={"start_date": "2026-01-12", "end_date": "2026-01-20"})
print(df)
11) MCQs (with answers)
1) Which function loads SQL query results into a DataFrame?
Answer: B) pd.read_sql
2) What is the safest way to include user input in SQL queries?
Answer: C) params (bind variables)
3) What does df.to_sql(..., if_exists="append") do?
Answer: B) Adds rows to existing table
4) Why use chunksize in pd.read_sql?
Answer: B) Stream data in parts to manage memory
5) Best place for large joins and aggregations?
Answer: C) Database engines are optimized for large relational work
12) Optional: Embedded Python Editor
If you want, you can embed your Python editor here and run the examples interactively.