Related Machine Learning Links
Learn Naive Bayes Machine Learning Tutorial, validate concepts with Naive Bayes Machine Learning MCQ Questions, and prepare interviews through Naive Bayes Machine Learning Interview Questions and Answers.
Machine Learning
Naive Bayes
Text Classification
Naive Bayes Classifier
Naive Bayes is a simple yet powerful probabilistic classifier that works well for high-dimensional data such as text.
Bayes' Theorem
Naive Bayes is based on Bayes' theorem:
\[ P(y \mid x) = \frac{P(x \mid y) P(y)}{P(x)} \]
The "naive" assumption is that features are conditionally independent given the class label.
Naive Bayes with scikit-learn
Text classification using MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
vectorizer = CountVectorizer()
X_vec = vectorizer.fit_transform(texts) # texts: list of documents
X_train, X_test, y_train, y_test = train_test_split(
X_vec, y, test_size=0.2, random_state=42, stratify=y
)
nb = MultinomialNB()
nb.fit(X_train, y_train)
y_pred = nb.predict(X_test)
print(classification_report(y_test, y_pred))
Types of Naive Bayes
- GaussianNB: for continuous features assumed to follow a normal distribution.
- MultinomialNB: for count data such as word frequencies in text.
- BernoulliNB: for binary features (e.g. word present / absent).
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
Strengths & Weaknesses
- Pros: extremely fast, works well with high‑dimensional sparse features, simple to implement.
- Cons: independence assumption is often violated; probability estimates can be poorly calibrated.
- Despite its simplicity, Naive Bayes is a strong baseline for many NLP tasks.