Level:

0% to 10%

Topic:

Beginning NLP

Context:

Champak Roy and Programmers Picnic

Goal:

Understand how text becomes numbers.

1. What is NLP?

NLP

NLP means Natural Language Processing.

It is a field where computers learn to understand human language.

Example from our context:

Suppose a student writes:

I want to learn AI from Champak Roy

A computer does not understand this like a human. NLP helps the computer identify important words such as:

learn

Champak

Roy

Simple meaning: NLP helps computers work with language.

2. Tokenization

Tokenization means breaking text into smaller parts.

Usually, we break a sentence into words.

Sentence:

Programmers Picnic teaches AI and ML

Tokens:

["Programmers", "Picnic", "teaches", "AI", "and", "ML"]

Each word is called a token.

Text	Tokens
Champak Roy teaches NLP	Champak, Roy, teaches, NLP
AI ML class	AI, ML, class
Programmers Picnic lesson	Programmers, Picnic, lesson

3. Common Words

Some words appear very often in sentences.

These are called common words or stop words.

Sentence:

Champak Roy is teaching AI in the class

Important words:

Champak

Roy

teaching

class

Common words:

the

Common Word	Why It Is Common
is	Used in many sentences
the	Very frequent in English
and	Used to join words
in	Used for location or position

In beginner NLP, we often remove common words to focus on important meaning words.

4. Synonyms

Synonyms are words with similar meaning.

Word	Synonym
lesson	tutorial
class	course
begin	start
student	learner
quick	fast

Example:

Start the AI lesson

Similar sentence:

Begin the AI tutorial

Here:

start and begin are similar.
lesson and tutorial are similar.

5. Antonyms

Antonyms are words with opposite meaning.

Word	Antonym
beginner	advanced
easy	difficult
online	offline
open	close
fast	slow

Example:

This is a beginner NLP lesson

Opposite idea:

This is an advanced NLP lesson

6. Text Similarity

Text similarity means checking how close two texts are in meaning.

Text 1

Programmers Picnic AI ML Classes

Text 2

AI ML course by Champak Roy

These two texts are not exactly the same, but they are related.

Both texts talk about AI, ML, learning, and classes.

NLP can help us measure this relationship.

7. Vectors

Computers do not directly understand words.

They understand numbers.

So, in NLP, we convert text into numbers. These numbers are called vectors.

Vocabulary:

["Programmers", "Picnic", "AI", "ML", "Champak", "Roy"]

Sentence:

Programmers Picnic AI

Vector:

[1, 1, 1, 0, 0, 0]

Meaning:

Word	Present?	Number
Programmers	Yes	1
Picnic	Yes	1
AI	Yes	1
ML	No	0
Champak	No	0
Roy	No	0

8. Cosine Similarity

Cosine similarity is a mathematical method to compare two vectors.

In NLP, we use it to compare text.

cosine similarity = matching direction of two vectors

Simple Idea

Imagine two arrows.

If both arrows point in the same direction, they are similar.
If both arrows point in different directions, they are less similar.

Text 1:

Programmers Picnic AI ML

Text 2:

AI ML class by Champak Roy

These texts share:

So their similarity will be greater than zero.

Cosine Similarity Value	Meaning
1	Very similar
0.5	Somewhat similar
0	Not similar

For beginners, remember this: higher cosine similarity means the texts are more related.

9. Python Example

Now let us compare two sentences using Python.

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

sentences = [
    "Programmers Picnic AI ML",
    "AI ML class by Champak Roy"
]

vectorizer = CountVectorizer()

vectors = vectorizer.fit_transform(sentences)

similarity = cosine_similarity(vectors)

print(similarity)

Possible Output

[[1.         0.40]
 [0.40       1.        ]]

Meaning:

First sentence compared with itself gives 1.
Second sentence compared with itself gives 1.
Both sentences compared with each other give around 0.40.

This means the two sentences are somewhat related.

10. More Real-Life Code Samples

These examples are designed like small real projects. Students can copy one example at a time into the editor, run it, change the sentences, and observe how the output changes.

Teacher tip: Start with the pure Python examples first. Then move to the scikit-learn examples after students understand tokens, important words, and vectors.

Example 1: Simple Word Tokenizer Without Any Library

Real-life use: breaking a student message into words before checking the topic.

message = "I want to learn Python and AI from Champak Roy"

words = message.lower().split()

print(words)

Example 2: Remove Common Words

Real-life use: keeping only the meaningful words from a search query.

sentence = "Champak Roy is teaching AI in the online class"

common_words = ["is", "in", "the", "a", "an", "and", "to", "of"]

words = sentence.lower().split()
important_words = []

for word in words:
    if word not in common_words:
        important_words.append(word)

print("All words:", words)
print("Important words:", important_words)

Example 3: Count Words in Student Feedback

Real-life use: finding repeated words in feedback after an AI-ML class.

feedback = "The AI class was easy and the Python examples were easy to understand"

words = feedback.lower().split()
word_count = {}

for word in words:
    if word in word_count:
        word_count[word] = word_count[word] + 1
    else:
        word_count[word] = 1

print(word_count)

Example 4: Find Whether a Message Is About AI

Real-life use: detecting whether a visitor query belongs to an AI-ML course page.

message = "Do you teach machine learning and neural networks?"

ai_keywords = ["ai", "ml", "machine", "learning", "neural", "python", "data"]

message_words = message.lower().split()

found_keywords = []

for word in message_words:
    if word in ai_keywords:
        found_keywords.append(word)

if len(found_keywords) > 0:
    print("This message is related to AI/ML")
    print("Matched words:", found_keywords)
else:
    print("This message is not clearly related to AI/ML")

Example 5: Very Simple FAQ Matcher

Real-life use: matching a student question with the nearest prepared answer.

faqs = {
    "fees": "Please visit learnwithchampak.live for course fee details.",
    "timing": "Classes are usually held according to the announced schedule.",
    "python": "Yes, Python is used in our AI-ML classes.",
    "certificate": "Certificate details will be shared by the teacher."
}

question = "Do you teach python in this course?"
question = question.lower()

answer_found = False

for keyword in faqs:
    if keyword in question:
        print(faqs[keyword])
        answer_found = True
        break

if answer_found == False:
    print("Please contact Champak Roy for details.")

Example 6: Search Engine Style Matching

Real-life use: showing the most relevant lesson when a student searches on the website.

lessons = [
    "Python variables and arithmetic operators",
    "Beginning NLP with text similarity",
    "Machine learning with simple datasets",
    "Google Search Console for Blogger",
    "Sorting trace and algorithm detection"
]

search = "text similarity NLP"
search_words = search.lower().split()

scores = []

for lesson in lessons:
    lesson_words = lesson.lower().split()
    score = 0

    for word in search_words:
        if word in lesson_words:
            score = score + 1

    scores.append([score, lesson])

scores.sort(reverse=True)

print("Best matching lessons:")
for score, lesson in scores:
    print(score, "-", lesson)

Example 7: Product Tag Detector

Real-life use: automatically tagging products in an affiliate or shop page.

product_title = "Wireless keyboard and mouse combo for computer setup"

categories = {
    "computer": ["keyboard", "mouse", "monitor", "laptop", "computer"],
    "mobile": ["phone", "charger", "cover", "earbuds"],
    "study": ["book", "notebook", "pen", "desk"]
}

product_words = product_title.lower().split()

for category, keywords in categories.items():
    for word in product_words:
        if word in keywords:
            print("Category:", category)
            break

Example 8: Sentiment-Like Feedback Checker

Real-life use: quickly checking whether class feedback is positive or negative.

feedback = "The lesson was clear easy useful and practical"

positive_words = ["good", "great", "clear", "easy", "useful", "practical"]
negative_words = ["bad", "hard", "confusing", "boring", "difficult"]

words = feedback.lower().split()
positive_score = 0
negative_score = 0

for word in words:
    if word in positive_words:
        positive_score = positive_score + 1
    if word in negative_words:
        negative_score = negative_score + 1

print("Positive score:", positive_score)
print("Negative score:", negative_score)

if positive_score > negative_score:
    print("Feedback looks positive")
elif negative_score > positive_score:
    print("Feedback looks negative")
else:
    print("Feedback looks neutral")

Example 9: Find Similarity Using Common Words

Real-life use: comparing two lesson titles without any advanced library.

text1 = "AI ML class by Champak Roy"
text2 = "Machine learning course by Programmers Picnic"

words1 = set(text1.lower().split())
words2 = set(text2.lower().split())

common = words1.intersection(words2)
all_words = words1.union(words2)

similarity = len(common) / len(all_words)

print("Words in text 1:", words1)
print("Words in text 2:", words2)
print("Common words:", common)
print("Simple similarity:", similarity)

Example 10: Improve Similarity Using Synonyms

Real-life use: treating words like class and course as similar.

text1 = "AI ML class"
text2 = "machine learning course"

synonyms = {
    "ai": "artificial-intelligence",
    "ml": "machine-learning",
    "machine": "machine-learning",
    "learning": "machine-learning",
    "class": "course"
}

def normalize(text):
    words = text.lower().split()
    final_words = []

    for word in words:
        if word in synonyms:
            final_words.append(synonyms[word])
        else:
            final_words.append(word)

    return set(final_words)

words1 = normalize(text1)
words2 = normalize(text2)

common = words1.intersection(words2)
all_words = words1.union(words2)

print("Normalized text 1:", words1)
print("Normalized text 2:", words2)
print("Common words:", common)
print("Similarity:", len(common) / len(all_words))

Example 11: Cosine Similarity for Course Search

Real-life use: ranking course pages according to a student search query.

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

pages = [
    "Python basics variables loops functions",
    "NLP text similarity cosine similarity vectors",
    "Machine learning model training dataset prediction",
    "Blogger SEO Google Search Console sitemap"
]

query = "text similarity vectors"

all_texts = [query] + pages

vectorizer = CountVectorizer()
vectors = vectorizer.fit_transform(all_texts)

scores = cosine_similarity(vectors[0], vectors[1:])[0]

for page, score in zip(pages, scores):
    print(round(score, 2), "-", page)

Example 12: Compare Student Answers With Expected Answer

Real-life use: checking whether a written answer is close to the model answer.

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

expected_answer = "NLP helps computers understand human language"
student_answer = "Natural language processing helps computer work with human text"

texts = [expected_answer, student_answer]

vectorizer = CountVectorizer()
vectors = vectorizer.fit_transform(texts)

score = cosine_similarity(vectors[0], vectors[1])[0][0]

print("Similarity score:", round(score, 2))

if score >= 0.5:
    print("The answer is reasonably close")
else:
    print("The answer needs improvement")

Example 13: Mini Chatbot for Course Enquiry

Real-life use: a very simple chatbot for an AI-ML class landing page.

question = input("Ask your question: ").lower()

if "python" in question:
    print("Yes, Python is used in our AI-ML course.")
elif "nlp" in question:
    print("Yes, NLP is part of the AI-ML learning path.")
elif "timing" in question or "time" in question:
    print("Please check the latest class schedule on learnwithchampak.live.")
elif "teacher" in question or "champak" in question:
    print("The class is by Champak Roy.")
else:
    print("Please contact us for this question.")

Example 14: Detect Important Words From Blog Title

Real-life use: extracting useful words for SEO labels or search description planning.

title = "Beginning NLP Lesson with Python and Cosine Similarity"

common_words = ["with", "and", "the", "a", "an", "for", "to", "in"]

words = title.lower().split()
keywords = []

for word in words:
    if word not in common_words:
        keywords.append(word)

print("Suggested keywords:", keywords)

Example 15: Build a Tiny Vocabulary

Real-life use: preparing the list of all unique words before creating vectors.

sentences = [
    "Programmers Picnic teaches Python",
    "Champak Roy teaches AI ML",
    "Python helps in AI ML learning"
]

vocabulary = []

for sentence in sentences:
    words = sentence.lower().split()

    for word in words:
        if word not in vocabulary:
            vocabulary.append(word)

print("Vocabulary:")
print(vocabulary)

Example 16: Convert Sentence Into a 0-1 Vector

Real-life use: showing beginners how text becomes numbers before using machine learning.

vocabulary = ["python", "ai", "ml", "nlp", "class"]

sentence = "python nlp class"
words = sentence.lower().split()

vector = []

for vocab_word in vocabulary:
    if vocab_word in words:
        vector.append(1)
    else:
        vector.append(0)

print("Vocabulary:", vocabulary)
print("Sentence:", sentence)
print("Vector:", vector)

Example 17: Create a Word Frequency Vector

Real-life use: counting how many times important words appear in a paragraph.

vocabulary = ["python", "ai", "ml", "nlp", "class"]

sentence = "python ai ai ml class class class"
words = sentence.lower().split()

vector = []

for vocab_word in vocabulary:
    count = words.count(vocab_word)
    vector.append(count)

print("Vocabulary:", vocabulary)
print("Frequency vector:", vector)

Example 18: Mini Project — Find the Best Matching Course Page

Real-life use: a small search feature for learnwithchampak.live or aiml.learnwithchampak.live.

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

course_pages = {
    "Python Basics": "variables operators loops functions beginner python",
    "Beginning NLP": "tokenization vectors cosine similarity text similarity nlp",
    "Machine Learning": "dataset training model prediction accuracy machine learning",
    "SEO for Blogger": "google search console sitemap blogger seo indexing"
}

student_search = input("Search a lesson: ")

texts = [student_search] + list(course_pages.values())

vectorizer = CountVectorizer()
vectors = vectorizer.fit_transform(texts)

scores = cosine_similarity(vectors[0], vectors[1:])[0]

page_names = list(course_pages.keys())

best_index = scores.argmax()

print("Best lesson:", page_names[best_index])
print("Score:", round(scores[best_index], 2))

Example 19: Mini Project — Group Messages by Topic

Real-life use: sorting student messages into course enquiry, technical doubt, or payment question.

messages = [
    "What is the class timing?",
    "I have an error in Python code",
    "How much is the course fee?",
    "Can I learn machine learning here?",
    "My program is not printing output"
]

topics = {
    "course enquiry": ["class", "timing", "learn", "course"],
    "technical doubt": ["error", "code", "program", "printing", "output"],
    "payment": ["fee", "payment", "price"]
}

for message in messages:
    message_words = message.lower().split()
    best_topic = "unknown"
    best_score = 0

    for topic, keywords in topics.items():
        score = 0

        for word in message_words:
            if word in keywords:
                score = score + 1

        if score > best_score:
            best_score = score
            best_topic = topic

    print(message, "-->", best_topic)

Example 20: Mini Project — Find Duplicate or Similar Titles

Real-life use: checking whether two blog titles are too similar before publishing.

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

old_titles = [
    "Beginning NLP with Python",
    "Python Arithmetic Operators for Beginners",
    "Google Search Console Full Guide",
    "Cosine Similarity in NLP"
]

new_title = "NLP basics with Python examples"

texts = [new_title] + old_titles

vectorizer = CountVectorizer()
vectors = vectorizer.fit_transform(texts)

scores = cosine_similarity(vectors[0], vectors[1:])[0]

for title, score in zip(old_titles, scores):
    print(round(score, 2), "-", title)

highest_score = max(scores)

if highest_score > 0.5:
    print("Warning: This title may be similar to an existing title.")
else:
    print("This title looks different enough.")

Classroom flow: Run Example 1 to 10 without installing anything. Then run Example 11 onward after installing scikit-learn in the editor.

11. Practice in Our Python Editor

Use the embedded Programmers Picnic Python editor below to run the NLP example.

Tip: If the embedded editor appears small on mobile, tap “Open in New Tab”.

12. Complete Beginner Summary

Topic	Meaning	Example
NLP	Computer understanding language	AI lesson search
Tokenization	Breaking text into words	Champak, Roy, AI
Common Words	Frequently used words	is, the, and
Synonyms	Similar words	lesson, tutorial
Antonyms	Opposite words	beginner, advanced
Text Similarity	How related two texts are	AI ML class and AI ML course
Vectors	Numbers representing text	[1, 1, 0, 0]
Cosine Similarity	Method to compare vectors	0.40 similarity

13. Practice Questions

What is the full form of NLP?
Break this sentence into tokens:
```
Champak Roy teaches AI
```
Find common words:
```
Programmers Picnic is an AI ML class
```
Give one synonym of lesson.
Give one antonym of beginner.
Why do we convert text into vectors?
What does a cosine similarity value near 1 mean?

14. Mini Assignment

Create a tiny text similarity experiment.

Use these sentences:

sentence1 = "Programmers Picnic teaches AI"
sentence2 = "Champak Roy teaches ML"
sentence3 = "The mango is yellow"

Think and answer:

Which two sentences are more similar?
Which sentence is least related?
Which words are common?
Which words carry important meaning?

Expected thinking: sentence1 and sentence2 are more related because both are about teaching, AI/ML, and our class context.