0% to 10%
Beginning NLP
Champak Roy and Programmers Picnic
Understand how text becomes numbers.
1. What is NLP?
NLP
NLP means Natural Language Processing.
It is a field where computers learn to understand human language.
Suppose a student writes:
I want to learn AI from Champak Roy
A computer does not understand this like a human. NLP helps the computer identify important words such as:
2. Tokenization
Tokenization means breaking text into smaller parts.
Usually, we break a sentence into words.
Programmers Picnic teaches AI and ML
Tokens:
["Programmers", "Picnic", "teaches", "AI", "and", "ML"]
Each word is called a token.
| Text | Tokens |
|---|---|
| Champak Roy teaches NLP | Champak, Roy, teaches, NLP |
| AI ML class | AI, ML, class |
| Programmers Picnic lesson | Programmers, Picnic, lesson |
3. Common Words
Some words appear very often in sentences.
These are called common words or stop words.
Champak Roy is teaching AI in the class
Important words:
Common words:
| Common Word | Why It Is Common |
|---|---|
| is | Used in many sentences |
| the | Very frequent in English |
| and | Used to join words |
| in | Used for location or position |
4. Synonyms
Synonyms are words with similar meaning.
| Word | Synonym |
|---|---|
| lesson | tutorial |
| class | course |
| begin | start |
| student | learner |
| quick | fast |
Start the AI lesson
Similar sentence:
Begin the AI tutorial
Here:
- start and begin are similar.
- lesson and tutorial are similar.
5. Antonyms
Antonyms are words with opposite meaning.
| Word | Antonym |
|---|---|
| beginner | advanced |
| easy | difficult |
| online | offline |
| open | close |
| fast | slow |
This is a beginner NLP lesson
Opposite idea:
This is an advanced NLP lesson
6. Text Similarity
Text similarity means checking how close two texts are in meaning.
Text 1
Programmers Picnic AI ML Classes
Text 2
AI ML course by Champak Roy
These two texts are not exactly the same, but they are related.
NLP can help us measure this relationship.
7. Vectors
Computers do not directly understand words.
They understand numbers.
So, in NLP, we convert text into numbers. These numbers are called vectors.
["Programmers", "Picnic", "AI", "ML", "Champak", "Roy"]
Sentence:
Programmers Picnic AI
Vector:
[1, 1, 1, 0, 0, 0]
Meaning:
| Word | Present? | Number |
|---|---|---|
| Programmers | Yes | 1 |
| Picnic | Yes | 1 |
| AI | Yes | 1 |
| ML | No | 0 |
| Champak | No | 0 |
| Roy | No | 0 |
8. Cosine Similarity
Cosine similarity is a mathematical method to compare two vectors.
In NLP, we use it to compare text.
Simple Idea
Imagine two arrows.
- If both arrows point in the same direction, they are similar.
- If both arrows point in different directions, they are less similar.
Programmers Picnic AI ML
Text 2:
AI ML class by Champak Roy
These texts share:
So their similarity will be greater than zero.
| Cosine Similarity Value | Meaning |
|---|---|
| 1 | Very similar |
| 0.5 | Somewhat similar |
| 0 | Not similar |
9. Python Example
Now let us compare two sentences using Python.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
sentences = [
"Programmers Picnic AI ML",
"AI ML class by Champak Roy"
]
vectorizer = CountVectorizer()
vectors = vectorizer.fit_transform(sentences)
similarity = cosine_similarity(vectors)
print(similarity)
Possible Output
[[1. 0.40]
[0.40 1. ]]
Meaning:
- First sentence compared with itself gives 1.
- Second sentence compared with itself gives 1.
- Both sentences compared with each other give around 0.40.
10. More Real-Life Code Samples
These examples are designed like small real projects. Students can copy one example at a time into the editor, run it, change the sentences, and observe how the output changes.
Example 1: Simple Word Tokenizer Without Any Library
Real-life use: breaking a student message into words before checking the topic.
message = "I want to learn Python and AI from Champak Roy"
words = message.lower().split()
print(words)
Example 2: Remove Common Words
Real-life use: keeping only the meaningful words from a search query.
sentence = "Champak Roy is teaching AI in the online class"
common_words = ["is", "in", "the", "a", "an", "and", "to", "of"]
words = sentence.lower().split()
important_words = []
for word in words:
if word not in common_words:
important_words.append(word)
print("All words:", words)
print("Important words:", important_words)
Example 3: Count Words in Student Feedback
Real-life use: finding repeated words in feedback after an AI-ML class.
feedback = "The AI class was easy and the Python examples were easy to understand"
words = feedback.lower().split()
word_count = {}
for word in words:
if word in word_count:
word_count[word] = word_count[word] + 1
else:
word_count[word] = 1
print(word_count)
Example 4: Find Whether a Message Is About AI
Real-life use: detecting whether a visitor query belongs to an AI-ML course page.
message = "Do you teach machine learning and neural networks?"
ai_keywords = ["ai", "ml", "machine", "learning", "neural", "python", "data"]
message_words = message.lower().split()
found_keywords = []
for word in message_words:
if word in ai_keywords:
found_keywords.append(word)
if len(found_keywords) > 0:
print("This message is related to AI/ML")
print("Matched words:", found_keywords)
else:
print("This message is not clearly related to AI/ML")
Example 5: Very Simple FAQ Matcher
Real-life use: matching a student question with the nearest prepared answer.
faqs = {
"fees": "Please visit learnwithchampak.live for course fee details.",
"timing": "Classes are usually held according to the announced schedule.",
"python": "Yes, Python is used in our AI-ML classes.",
"certificate": "Certificate details will be shared by the teacher."
}
question = "Do you teach python in this course?"
question = question.lower()
answer_found = False
for keyword in faqs:
if keyword in question:
print(faqs[keyword])
answer_found = True
break
if answer_found == False:
print("Please contact Champak Roy for details.")
Example 6: Search Engine Style Matching
Real-life use: showing the most relevant lesson when a student searches on the website.
lessons = [
"Python variables and arithmetic operators",
"Beginning NLP with text similarity",
"Machine learning with simple datasets",
"Google Search Console for Blogger",
"Sorting trace and algorithm detection"
]
search = "text similarity NLP"
search_words = search.lower().split()
scores = []
for lesson in lessons:
lesson_words = lesson.lower().split()
score = 0
for word in search_words:
if word in lesson_words:
score = score + 1
scores.append([score, lesson])
scores.sort(reverse=True)
print("Best matching lessons:")
for score, lesson in scores:
print(score, "-", lesson)
Example 7: Product Tag Detector
Real-life use: automatically tagging products in an affiliate or shop page.
product_title = "Wireless keyboard and mouse combo for computer setup"
categories = {
"computer": ["keyboard", "mouse", "monitor", "laptop", "computer"],
"mobile": ["phone", "charger", "cover", "earbuds"],
"study": ["book", "notebook", "pen", "desk"]
}
product_words = product_title.lower().split()
for category, keywords in categories.items():
for word in product_words:
if word in keywords:
print("Category:", category)
break
Example 8: Sentiment-Like Feedback Checker
Real-life use: quickly checking whether class feedback is positive or negative.
feedback = "The lesson was clear easy useful and practical"
positive_words = ["good", "great", "clear", "easy", "useful", "practical"]
negative_words = ["bad", "hard", "confusing", "boring", "difficult"]
words = feedback.lower().split()
positive_score = 0
negative_score = 0
for word in words:
if word in positive_words:
positive_score = positive_score + 1
if word in negative_words:
negative_score = negative_score + 1
print("Positive score:", positive_score)
print("Negative score:", negative_score)
if positive_score > negative_score:
print("Feedback looks positive")
elif negative_score > positive_score:
print("Feedback looks negative")
else:
print("Feedback looks neutral")
Example 9: Find Similarity Using Common Words
Real-life use: comparing two lesson titles without any advanced library.
text1 = "AI ML class by Champak Roy"
text2 = "Machine learning course by Programmers Picnic"
words1 = set(text1.lower().split())
words2 = set(text2.lower().split())
common = words1.intersection(words2)
all_words = words1.union(words2)
similarity = len(common) / len(all_words)
print("Words in text 1:", words1)
print("Words in text 2:", words2)
print("Common words:", common)
print("Simple similarity:", similarity)
Example 10: Improve Similarity Using Synonyms
Real-life use: treating words like class and course as similar.
text1 = "AI ML class"
text2 = "machine learning course"
synonyms = {
"ai": "artificial-intelligence",
"ml": "machine-learning",
"machine": "machine-learning",
"learning": "machine-learning",
"class": "course"
}
def normalize(text):
words = text.lower().split()
final_words = []
for word in words:
if word in synonyms:
final_words.append(synonyms[word])
else:
final_words.append(word)
return set(final_words)
words1 = normalize(text1)
words2 = normalize(text2)
common = words1.intersection(words2)
all_words = words1.union(words2)
print("Normalized text 1:", words1)
print("Normalized text 2:", words2)
print("Common words:", common)
print("Similarity:", len(common) / len(all_words))
Example 11: Cosine Similarity for Course Search
Real-life use: ranking course pages according to a student search query.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
pages = [
"Python basics variables loops functions",
"NLP text similarity cosine similarity vectors",
"Machine learning model training dataset prediction",
"Blogger SEO Google Search Console sitemap"
]
query = "text similarity vectors"
all_texts = [query] + pages
vectorizer = CountVectorizer()
vectors = vectorizer.fit_transform(all_texts)
scores = cosine_similarity(vectors[0], vectors[1:])[0]
for page, score in zip(pages, scores):
print(round(score, 2), "-", page)
Example 12: Compare Student Answers With Expected Answer
Real-life use: checking whether a written answer is close to the model answer.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
expected_answer = "NLP helps computers understand human language"
student_answer = "Natural language processing helps computer work with human text"
texts = [expected_answer, student_answer]
vectorizer = CountVectorizer()
vectors = vectorizer.fit_transform(texts)
score = cosine_similarity(vectors[0], vectors[1])[0][0]
print("Similarity score:", round(score, 2))
if score >= 0.5:
print("The answer is reasonably close")
else:
print("The answer needs improvement")
Example 13: Mini Chatbot for Course Enquiry
Real-life use: a very simple chatbot for an AI-ML class landing page.
question = input("Ask your question: ").lower()
if "python" in question:
print("Yes, Python is used in our AI-ML course.")
elif "nlp" in question:
print("Yes, NLP is part of the AI-ML learning path.")
elif "timing" in question or "time" in question:
print("Please check the latest class schedule on learnwithchampak.live.")
elif "teacher" in question or "champak" in question:
print("The class is by Champak Roy.")
else:
print("Please contact us for this question.")
Example 14: Detect Important Words From Blog Title
Real-life use: extracting useful words for SEO labels or search description planning.
title = "Beginning NLP Lesson with Python and Cosine Similarity"
common_words = ["with", "and", "the", "a", "an", "for", "to", "in"]
words = title.lower().split()
keywords = []
for word in words:
if word not in common_words:
keywords.append(word)
print("Suggested keywords:", keywords)
Example 15: Build a Tiny Vocabulary
Real-life use: preparing the list of all unique words before creating vectors.
sentences = [
"Programmers Picnic teaches Python",
"Champak Roy teaches AI ML",
"Python helps in AI ML learning"
]
vocabulary = []
for sentence in sentences:
words = sentence.lower().split()
for word in words:
if word not in vocabulary:
vocabulary.append(word)
print("Vocabulary:")
print(vocabulary)
Example 16: Convert Sentence Into a 0-1 Vector
Real-life use: showing beginners how text becomes numbers before using machine learning.
vocabulary = ["python", "ai", "ml", "nlp", "class"]
sentence = "python nlp class"
words = sentence.lower().split()
vector = []
for vocab_word in vocabulary:
if vocab_word in words:
vector.append(1)
else:
vector.append(0)
print("Vocabulary:", vocabulary)
print("Sentence:", sentence)
print("Vector:", vector)
Example 17: Create a Word Frequency Vector
Real-life use: counting how many times important words appear in a paragraph.
vocabulary = ["python", "ai", "ml", "nlp", "class"]
sentence = "python ai ai ml class class class"
words = sentence.lower().split()
vector = []
for vocab_word in vocabulary:
count = words.count(vocab_word)
vector.append(count)
print("Vocabulary:", vocabulary)
print("Frequency vector:", vector)
Example 18: Mini Project — Find the Best Matching Course Page
Real-life use: a small search feature for learnwithchampak.live or aiml.learnwithchampak.live.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
course_pages = {
"Python Basics": "variables operators loops functions beginner python",
"Beginning NLP": "tokenization vectors cosine similarity text similarity nlp",
"Machine Learning": "dataset training model prediction accuracy machine learning",
"SEO for Blogger": "google search console sitemap blogger seo indexing"
}
student_search = input("Search a lesson: ")
texts = [student_search] + list(course_pages.values())
vectorizer = CountVectorizer()
vectors = vectorizer.fit_transform(texts)
scores = cosine_similarity(vectors[0], vectors[1:])[0]
page_names = list(course_pages.keys())
best_index = scores.argmax()
print("Best lesson:", page_names[best_index])
print("Score:", round(scores[best_index], 2))
Example 19: Mini Project — Group Messages by Topic
Real-life use: sorting student messages into course enquiry, technical doubt, or payment question.
messages = [
"What is the class timing?",
"I have an error in Python code",
"How much is the course fee?",
"Can I learn machine learning here?",
"My program is not printing output"
]
topics = {
"course enquiry": ["class", "timing", "learn", "course"],
"technical doubt": ["error", "code", "program", "printing", "output"],
"payment": ["fee", "payment", "price"]
}
for message in messages:
message_words = message.lower().split()
best_topic = "unknown"
best_score = 0
for topic, keywords in topics.items():
score = 0
for word in message_words:
if word in keywords:
score = score + 1
if score > best_score:
best_score = score
best_topic = topic
print(message, "-->", best_topic)
Example 20: Mini Project — Find Duplicate or Similar Titles
Real-life use: checking whether two blog titles are too similar before publishing.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
old_titles = [
"Beginning NLP with Python",
"Python Arithmetic Operators for Beginners",
"Google Search Console Full Guide",
"Cosine Similarity in NLP"
]
new_title = "NLP basics with Python examples"
texts = [new_title] + old_titles
vectorizer = CountVectorizer()
vectors = vectorizer.fit_transform(texts)
scores = cosine_similarity(vectors[0], vectors[1:])[0]
for title, score in zip(old_titles, scores):
print(round(score, 2), "-", title)
highest_score = max(scores)
if highest_score > 0.5:
print("Warning: This title may be similar to an existing title.")
else:
print("This title looks different enough.")
11. Practice in Our Python Editor
Use the embedded Programmers Picnic Python editor below to run the NLP example.
12. Complete Beginner Summary
| Topic | Meaning | Example |
|---|---|---|
| NLP | Computer understanding language | AI lesson search |
| Tokenization | Breaking text into words | Champak, Roy, AI |
| Common Words | Frequently used words | is, the, and |
| Synonyms | Similar words | lesson, tutorial |
| Antonyms | Opposite words | beginner, advanced |
| Text Similarity | How related two texts are | AI ML class and AI ML course |
| Vectors | Numbers representing text | [1, 1, 0, 0] |
| Cosine Similarity | Method to compare vectors | 0.40 similarity |
13. Practice Questions
- What is the full form of NLP?
-
Break this sentence into tokens:
Champak Roy teaches AI -
Find common words:
Programmers Picnic is an AI ML class - Give one synonym of lesson.
- Give one antonym of beginner.
- Why do we convert text into vectors?
- What does a cosine similarity value near 1 mean?
14. Mini Assignment
Create a tiny text similarity experiment.
Use these sentences:
sentence1 = "Programmers Picnic teaches AI"
sentence2 = "Champak Roy teaches ML"
sentence3 = "The mango is yellow"
Think and answer:
- Which two sentences are more similar?
- Which sentence is least related?
- Which words are common?
- Which words carry important meaning?