Expt No: 3 Naïve Bayes Models
Date:
Aim: To write a program to demonstrate Naïve
Bayes models using Iris dataset
Program
import
pandas as pd
from
sklearn.datasets import make_classification
from
sklearn.model_selection import train_test_split
from
sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB,
ComplementNB, CategoricalNB
from
sklearn.metrics import accuracy_score
#
Load the Iris dataset from CSV file
iris_df
= pd.read_csv("Iris.csv")
iris_df.drop(columns=["Id"],
inplace=True)
#
Display the dataset characteristics
print("Iris
Dataset Characteristics:")
print("Number
of samples:", iris_df.shape[0])
print("Number
of features:", iris_df.shape[1] - 1)
print("Classes:",
iris_df["Species"].unique())
#
Summary statistics for each feature
summary_stats
= iris_df.describe()
#
Display summary statistics and class distribution
print("Summary
Statistics for each feature:")
print(summary_stats)
#
Box plots for each feature grouped by the target variable "Species"
plt.figure(figsize=(12,
8))
for
i, column in enumerate(iris_df.columns[:-1]):
plt.subplot(2, 2, i+1)
sns.boxplot(x="Species",
y=column, data=iris_df)
plt.title(f"Box plot - {column}")
plt.xlabel("Species")
plt.ylabel(column)
plt.suptitle("Box
Plots of Features by Species")
plt.tight_layout()
plt.show()
#
Class distribution
class_distribution
= iris_df["Species"].value_counts()
print("\nClass
Distribution:")
print(class_distribution)
#
Naive Bayes models
#
Load the Iris dataset from CSV file
X =
iris_df.drop(columns=["Species"]).values
y =
iris_df["Species"].values
#
Split the data into training and testing sets
X_train,
X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
#
Gaussian Naive Bayes
gnb
= GaussianNB()
gnb.fit(X_train,
y_train)
gnb_pred
= gnb.predict(X_test)
gnb_accuracy
= accuracy_score(y_test, gnb_pred)
#
Multinomial Naive Bayes
mnb
= MultinomialNB()
mnb.fit(X_train,
y_train)
mnb_pred
= mnb.predict(X_test)
mnb_accuracy
= accuracy_score(y_test, mnb_pred)
# Convvert
the continuous features to binary
binarization_threshold
= 0.5
X_train_binary
= (X_train > binarization_threshold).astype(int)
X_test_binary
= (X_test > binarization_threshold).astype(int)
#
Bernoulli Naive Bayes with adjusted binary features
bnb
= BernoulliNB()
bnb.fit(X_train_binary,
y_train)
bnb_pred
= bnb.predict(X_test_binary)
bnb_accuracy
= accuracy_score(y_test, bnb_pred)
#
Complement Naive Bayes
cnb
= ComplementNB()
cnb.fit(X_train,
y_train)
cnb_pred
= cnb.predict(X_test)
cnb_accuracy
= accuracy_score(y_test, cnb_pred)
#
Categorical Naive Bayes
catnb
= CategoricalNB()
catnb.fit(X_train,
y_train)
catnb_pred
= catnb.predict(X_test)
catnb_accuracy
= accuracy_score(y_test, catnb_pred)
#
Accuracy of various Naïve Bayes models
print("Accuracy
of various Naive Bayes models for Iris datatset. ")
print("Gaussian
Naive Bayes:", format(gnb_accuracy, '.4f'))
print("Multinomial
Naive Bayes:", format(mnb_accuracy, '.4f'))
print("Bernoulli
Naive Bayes:", format(bnb_accuracy, '.4f'))
print("Complement
Naive Bayes:", format(cnb_accuracy, '.4f'))
print("Categorical
Naive Bayes:", format(catnb_accuracy, '.4f'))
Result: Thus the
program to demonstrate Naïve Bayes models was written and executed
Sample Output:
Iris
Dataset Characteristics:
Number
of samples: 150
Number
of features: 4
Classes:
['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']
Summary
Statistics for each feature:
SepalLength SepalWidth PetalLength PetalWidth
count 150.000000
150.000000 150.000000 150.000000
mean
5.843333 3.054000 3.758667 1.198667
std
0.828066 0.433594 1.764420 0.763161
min
4.300000 2.000000 1.000000 0.100000
25%
5.100000 2.800000 1.600000 0.300000
50%
5.800000 3.000000 4.350000 1.300000
75%
6.400000 3.300000 5.100000 1.800000
max
7.900000 4.400000 6.900000 2.500000
Class
Distribution:
Iris-versicolor 50
Iris-virginica 50
Iris-setosa 50
Name:
Species, dtype: int64
Accuracy
of various Naive Bayes models for Iris datatset.
Gaussian
Naive Bayes: 1.0000
Multinomial
Naive Bayes: 0.9000
Bernoulli
Naive Bayes: 0.6333
Complement
Naive Bayes: 0.7000
Categorical
Naive Bayes: 0.9667