Using SQL with Pandas (Beginner → Advanced)

0) Quick Setup

You can practice this lesson using SQLite (no server needed). For MySQL/PostgreSQL, you’ll typically use SQLAlchemy + a DB driver.

Install (recommended)

For most databases, SQLAlchemy is the cleanest way.

pip install pandas sqlalchemy

SQLite uses Python built-in driver (sqlite3), so no extra DB driver needed.

Optional drivers

Depending on your database:

PostgreSQL: pip install psycopg2-binary (or psycopg)
MySQL: pip install pymysql (or mysqlclient)
MS SQL: pip install pyodbc

Important:

Pandas is not a database. It’s a fast in-memory tool. Use SQL for filtering/joins/aggregations on large datasets, and use Pandas for final shaping, visualization, and business logic.

1) What “SQL in Pandas” means

Pandas can talk to SQL databases so you can:

Read SQL tables or query results into a DataFrame.
Write a DataFrame back into a SQL table.
Combine SQL (best for big joins/filters) + Pandas (best for flexible transformations).

Two main functions

Read pd.read_sql(sql, con) — returns a DataFrame
Write df.to_sql(name, con) — writes a table

2) SQLite + pd.read_sql (no server needed)

SQLite is perfect for learning because it’s a single file DB (e.g., shop.db).

2.1 Create a sample database + tables

import sqlite3
import pandas as pd

# Create (or open) a database file
con = sqlite3.connect("shop.db")

# Create tables + sample data
con.executescript("""
DROP TABLE IF EXISTS customers;
DROP TABLE IF EXISTS orders;

CREATE TABLE customers(
  customer_id INTEGER PRIMARY KEY,
  name TEXT NOT NULL,
  city TEXT NOT NULL
);

CREATE TABLE orders(
  order_id INTEGER PRIMARY KEY,
  customer_id INTEGER NOT NULL,
  amount REAL NOT NULL,
  created_at TEXT NOT NULL,
  FOREIGN KEY(customer_id) REFERENCES customers(customer_id)
);

INSERT INTO customers(name, city) VALUES
('Asha', 'Varanasi'),
('Ravi', 'Lucknow'),
('Neha', 'Delhi');

INSERT INTO orders(customer_id, amount, created_at) VALUES
(1, 399.0, '2026-01-10'),
(1, 1200.5,'2026-01-18'),
(2, 250.0, '2026-01-12'),
(3, 999.0, '2026-01-20');
""")

con.commit()
con.close()

2.2 Read a full table

import sqlite3
import pandas as pd

con = sqlite3.connect("shop.db")
df_customers = pd.read_sql("SELECT * FROM customers;", con)
df_orders = pd.read_sql("SELECT * FROM orders;", con)
con.close()

print(df_customers)
print(df_orders)

2.3 Read a query result (filter + sort)

import sqlite3
import pandas as pd

con = sqlite3.connect("shop.db")

sql = """
SELECT order_id, customer_id, amount, created_at
FROM orders
WHERE amount >= 500
ORDER BY amount DESC;
"""
df = pd.read_sql(sql, con)

con.close()
print(df)

Tip:

Use SQL for filtering large tables. Then use Pandas for formatting, charts, exporting to CSV/Excel, etc.

3) SQLAlchemy engines (recommended approach)

SQLAlchemy standardizes connections across databases. With it, the same Pandas code can work for SQLite, PostgreSQL, MySQL, etc.

3.1 SQLite engine example

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("sqlite:///shop.db")  # file-based sqlite database
df = pd.read_sql("SELECT * FROM customers;", engine)
print(df)

3.2 Connection strings (patterns)

# PostgreSQL (example)
# engine = create_engine("postgresql+psycopg2://USER:PASSWORD@HOST:5432/DBNAME")

# MySQL (example)
# engine = create_engine("mysql+pymysql://USER:PASSWORD@HOST:3306/DBNAME")

# SQL Server (example via pyodbc)
# engine = create_engine("mssql+pyodbc://USER:PASSWORD@HOST/DBNAME?driver=ODBC+Driver+17+for+SQL+Server")

Security note:

Don’t hardcode passwords in files you share. Use environment variables or a secrets manager.

4) Parameterized queries (safe SQL)

The biggest mistake beginners make is building SQL using string concatenation. That can lead to SQL Injection.

4.1 Safe parameters with SQLite (DB-API style)

import sqlite3
import pandas as pd

min_amount = 500

con = sqlite3.connect("shop.db")
sql = "SELECT * FROM orders WHERE amount >= ? ORDER BY amount DESC;"
df = pd.read_sql(sql, con, params=(min_amount,))
con.close()

print(df)

4.2 Safe parameters with SQLAlchemy (recommended)

import pandas as pd
from sqlalchemy import create_engine, text

engine = create_engine("sqlite:///shop.db")

sql = text("""
SELECT * FROM orders
WHERE amount >= :min_amount
ORDER BY amount DESC
""")

df = pd.read_sql(sql, engine, params={"min_amount": 500})
print(df)

Rule:

Always pass user inputs through params (never manually paste them into SQL strings).

5) Joins + GROUP BY + window functions

SQL shines here: joining big tables and aggregating quickly (especially when indexes exist).

5.1 Join orders with customer names

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("sqlite:///shop.db")

sql = """
SELECT
  o.order_id,
  c.name,
  c.city,
  o.amount,
  o.created_at
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
ORDER BY o.created_at;
"""

df = pd.read_sql(sql, engine)
print(df)

5.2 GROUP BY (total spend per customer)

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("sqlite:///shop.db")

sql = """
SELECT
  c.customer_id,
  c.name,
  SUM(o.amount) AS total_spend,
  COUNT(*) AS orders_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.customer_id, c.name
ORDER BY total_spend DESC;
"""

df = pd.read_sql(sql, engine)
print(df)

5.3 Window function (running total per customer)

Window functions are powerful for rankings, running totals, and comparisons without losing row-level data.

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("sqlite:///shop.db")

sql = """
SELECT
  o.order_id,
  o.customer_id,
  o.amount,
  o.created_at,
  SUM(o.amount) OVER (
    PARTITION BY o.customer_id
    ORDER BY o.created_at
  ) AS running_total
FROM orders o
ORDER BY o.customer_id, o.created_at;
"""

df = pd.read_sql(sql, engine)
print(df)

6) Writing DataFrames back to SQL (df.to_sql)

Use to_sql for loading transformed data into a database table.

6.1 Create a DataFrame and write it

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("sqlite:///shop.db")

df_new = pd.DataFrame({
  "name": ["Irfan", "Meera"],
  "city": ["Pune", "Varanasi"]
})

# if_exists:
# 'fail' (default) = error if table exists
# 'replace' = drop + recreate
# 'append' = add rows
df_new.to_sql("new_customers", engine, if_exists="replace", index=False)

# Verify
df_check = pd.read_sql("SELECT * FROM new_customers;", engine)
print(df_check)

6.2 Append data (common in pipelines)

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("sqlite:///shop.db")

df_more = pd.DataFrame({
  "name": ["Sonia"],
  "city": ["Delhi"]
})

df_more.to_sql("new_customers", engine, if_exists="append", index=False)

print(pd.read_sql("SELECT * FROM new_customers;", engine))

Data types:

Pandas will infer types, but for strict schemas you can set column types via SQLAlchemy (advanced), or create the table in SQL first, then append.

7) Chunking & reading large results safely

Loading millions of rows at once can crash your RAM. Use chunks to stream results and process gradually.

7.1 Read in chunks

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("sqlite:///shop.db")

sql = "SELECT * FROM orders;"
chunks = pd.read_sql(sql, engine, chunksize=2)

total = 0
rows = 0

for chunk in chunks:
  rows += len(chunk)
  total += chunk["amount"].sum()

print("rows:", rows)
print("total amount:", total)

7.2 Chunk + transform + write back (mini ETL)

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("sqlite:///shop.db")

# Example: add a derived column and store to a new table
chunks = pd.read_sql("SELECT * FROM orders;", engine, chunksize=2)

first = True
for chunk in chunks:
  chunk["amount_with_gst"] = chunk["amount"] * 1.18
  chunk.to_sql("orders_enriched", engine,
               if_exists=("replace" if first else "append"),
               index=False)
  first = False

print(pd.read_sql("SELECT * FROM orders_enriched;", engine))

8) Performance tips (real-world)

Push work to SQL

If the table is huge, do filtering/joins/aggregation in SQL first:

Use WHERE to reduce rows
Use SELECT to reduce columns
Use GROUP BY to aggregate early

Add indexes (DB-side)

Indexes speed up joins and filters:

-- example (run in SQL):
CREATE INDEX IF NOT EXISTS idx_orders_customer ON orders(customer_id);
CREATE INDEX IF NOT EXISTS idx_orders_created  ON orders(created_at);

Golden workflow

SQL for heavy lifting → Pandas for final touches → store results back to SQL.

Avoid:

Reading entire tables unnecessarily (no WHERE), then doing all joins in Pandas for very large data. Pandas joins are great, but DB engines are optimized for big relational operations.

9) Common errors & quick fixes

“no such table: …”

You’re connected to the wrong database file / schema, or the table name is wrong. For SQLite, confirm the DB file path. For other DBs, confirm schema and connection string.

“database is locked” (SQLite)

Another process is writing to the DB. Use short transactions, close connections, and avoid many writers at once.

“ObjectNotExecutableError” (SQLAlchemy)

Wrap SQL strings using text(...) when needed: from sqlalchemy import text.

Dates come as strings

Convert in Pandas: df["created_at"] = pd.to_datetime(df["created_at"]) or do DB casting if supported.

Writing to SQL creates an unwanted index column

Use index=False in to_sql.

10) Mini practice tasks (with answers)

Use the shop.db tables from earlier sections.

Task 1: Show total spend per city

Expected: join customers + orders, group by city.

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("sqlite:///shop.db")

sql = """
SELECT c.city, SUM(o.amount) AS total_spend, COUNT(*) AS orders_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.city
ORDER BY total_spend DESC;
"""

df = pd.read_sql(sql, engine)
print(df)

Task 2: Find customers who have no orders

Hint: LEFT JOIN + WHERE ... IS NULL

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("sqlite:///shop.db")

sql = """
SELECT c.customer_id, c.name, c.city
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
WHERE o.order_id IS NULL;
"""

df = pd.read_sql(sql, engine)
print(df)

Task 3: Read orders from a date range using safe parameters

Use SQLAlchemy + named params.

import pandas as pd
from sqlalchemy import create_engine, text

engine = create_engine("sqlite:///shop.db")

sql = text("""
SELECT *
FROM orders
WHERE created_at BETWEEN :start_date AND :end_date
ORDER BY created_at
""")

df = pd.read_sql(sql, engine, params={"start_date": "2026-01-12", "end_date": "2026-01-20"})
print(df)

11) MCQs (with answers)

1) Which function loads SQL query results into a DataFrame?

A) df.to_sql B) pd.read_sql C) df.read_sql D) pd.to_sql

Answer: B) pd.read_sql

2) What is the safest way to include user input in SQL queries?

A) String concatenation B) f-strings C) params / bind variables D) copy-paste

Answer: C) params (bind variables)

3) What does df.to_sql(..., if_exists="append") do?

A) Deletes table B) Adds rows to existing table C) Creates DB D) Drops database

Answer: B) Adds rows to existing table

4) Why use chunksize in pd.read_sql?

A) For better formatting B) To avoid loading huge data into RAM at once C) For printing D) For sorting

Answer: B) Stream data in parts to manage memory

5) Best place for large joins and aggregations?

A) Always Pandas B) Always Excel C) Database (SQL) first D) Not possible

Answer: C) Database engines are optimized for large relational work

12) Optional: Embedded Python Editor

If you want, you can embed your Python editor here and run the examples interactively.

Teacher tip (7 PM batch):

Give students shop.db as a downloadable file, then ask them to complete Tasks 1–3 and paste screenshots of output.