Naïve Bayes Models

 

Expt No: 3                                           Naïve Bayes Models

Date:

 

Aim: To write a program to demonstrate Naïve Bayes models using Iris dataset

 

Program

import pandas as pd

 

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB, ComplementNB, CategoricalNB

from sklearn.metrics import accuracy_score

 

# Load the Iris dataset from CSV file

iris_df = pd.read_csv("Iris.csv")

iris_df.drop(columns=["Id"], inplace=True)

 

# Display the dataset characteristics

print("Iris Dataset Characteristics:")

print("Number of samples:", iris_df.shape[0])

print("Number of features:", iris_df.shape[1] - 1) 

print("Classes:", iris_df["Species"].unique())

 

# Summary statistics for each feature

summary_stats = iris_df.describe()

 

# Display summary statistics and class distribution

print("Summary Statistics for each feature:")

print(summary_stats)

 

# Box plots for each feature grouped by the target variable "Species"

plt.figure(figsize=(12, 8))

for i, column in enumerate(iris_df.columns[:-1]):

    plt.subplot(2, 2, i+1)

    sns.boxplot(x="Species", y=column, data=iris_df)

    plt.title(f"Box plot - {column}")

    plt.xlabel("Species")

    plt.ylabel(column)

plt.suptitle("Box Plots of Features by Species")

plt.tight_layout()

plt.show()

 

# Class distribution

class_distribution = iris_df["Species"].value_counts()

 

print("\nClass Distribution:")

print(class_distribution)

 

# Naive Bayes models

# Load the Iris dataset from CSV file

X = iris_df.drop(columns=["Species"]).values

y = iris_df["Species"].values

 

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

 

# Gaussian Naive Bayes

gnb = GaussianNB()

gnb.fit(X_train, y_train)

gnb_pred = gnb.predict(X_test)

gnb_accuracy = accuracy_score(y_test, gnb_pred)

 

# Multinomial Naive Bayes

mnb = MultinomialNB()

mnb.fit(X_train, y_train)

mnb_pred = mnb.predict(X_test)

mnb_accuracy = accuracy_score(y_test, mnb_pred)

 

# Convvert the continuous features to binary

binarization_threshold = 0.5

X_train_binary = (X_train > binarization_threshold).astype(int)

X_test_binary = (X_test > binarization_threshold).astype(int)

 

# Bernoulli Naive Bayes with adjusted binary features

bnb = BernoulliNB()

bnb.fit(X_train_binary, y_train)

bnb_pred = bnb.predict(X_test_binary)

bnb_accuracy = accuracy_score(y_test, bnb_pred)

 

# Complement Naive Bayes

cnb = ComplementNB()

cnb.fit(X_train, y_train)

cnb_pred = cnb.predict(X_test)

cnb_accuracy = accuracy_score(y_test, cnb_pred)

 

 

# Categorical Naive Bayes

catnb = CategoricalNB()

catnb.fit(X_train, y_train)

catnb_pred = catnb.predict(X_test)

catnb_accuracy = accuracy_score(y_test, catnb_pred)

 

# Accuracy of various Naïve Bayes models

print("Accuracy of various Naive Bayes models for Iris datatset. ")

print("Gaussian Naive Bayes:", format(gnb_accuracy, '.4f'))

print("Multinomial Naive Bayes:", format(mnb_accuracy, '.4f'))

print("Bernoulli Naive Bayes:", format(bnb_accuracy, '.4f'))

print("Complement Naive Bayes:", format(cnb_accuracy, '.4f'))

print("Categorical Naive Bayes:", format(catnb_accuracy, '.4f'))

 

 




Result: Thus the program to demonstrate Naïve Bayes models was written and executed


Sample Output:

 

Iris Dataset Characteristics:

Number of samples: 150

Number of features: 4

Classes: ['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']

 

Summary Statistics for each feature:

                  SepalLength          SepalWidth            PetalLength           PetalWidth

count         150.000000           150.000000           150.000000           150.000000

mean         5.843333               3.054000               3.758667               1.198667

std             0.828066               0.433594               1.764420               0.763161

min            4.300000               2.000000               1.000000               0.100000

25%           5.100000               2.800000               1.600000               0.300000

50%           5.800000               3.000000               4.350000               1.300000

75%           6.400000               3.300000               5.100000               1.800000

max           7.900000               4.400000               6.900000               2.500000

 


 

 

Class Distribution:

Iris-versicolor    50

Iris-virginica     50

Iris-setosa        50

Name: Species, dtype: int64

 

Accuracy of various Naive Bayes models for Iris datatset.

Gaussian Naive Bayes: 1.0000

Multinomial Naive Bayes: 0.9000

Bernoulli Naive Bayes: 0.6333

Complement Naive Bayes: 0.7000

Categorical Naive Bayes: 0.9667



 


No comments:

Post a Comment

Don't be a silent reader...
Leave your comments...

Anu