# INTRODUCTION

In this blog, we will explore the mushroom dataset and use a Naive Bayes classifier to predict the edibility of mushrooms.

`Importing Necessary Packages:`

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
```

`Reading the data:`

```
mush=pd.read_csv('mushrooms.csv')
```

`Exploratory Data Analysis(EDA):`

```
mush.info()
```

```
mush.columns
```

```
mush.isna().sum()
```

`Statistical Inference:`

```
mush.describe().T
```

```
mush['class'].value_counts(normalize=True)
```

**The mushroom dataset is fairly balanced in nature.**

Here, we will build a Naive Bayes classifier to predict whether the mushroom is 'e' for edible or 'p' for poisonous.

`Looking after the unique values in each columns of mushroom dataset:`

```
for i in mush.columns:
print(i,mush[i].unique())
```

```
mush['stalk-root'].value_counts(normalize=True)
```

We can observe that the 'stalk-root' is a single column with '?' present in around 30% of its entries. Since 30% of the data represents a significant amount and cannot be removed from each column, the alternative is to remove the entire 'stalk-root' column.

`Removing the stalk-root:`

```
mush=mush.drop(columns=['stalk-root'])
mush.info()
```

The column-stalk root is removed.

`Data Preparation:`

```
#Input Features:
X=mush.iloc[:,1:]
print(X)
```

```
#Output:
Y=mush['class']
print(Y)
```

Since the inputs are categorical in nature, we need to convert them into numerical variables, as the Naive Bayes classifier deals with probability. We can achieve this conversion using 'get_dummies/Label Encoding.

```
X=pd.get_dummies(X)
```

The 'get_dummies' function operates by taking a DataFrame, series, or list and transforming each distinct element into a column header, subsequently assigning a value of 1 if there's a match and 0 if there isn't.

```
print(X)
```

`Splitting the data into training and testing:`

```
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.3,random_state=1)
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)
(5686, 112)
(2438, 112)
(5686,)
(2438,)
```

`Training the model:`

```
from sklearn.naive_bayes import GaussianNB
NB_classifier=GaussianNB()
NB_classifier.fit(x_train,y_train)
```

`Predicting the output:`

```
y_pred=NB_classifier.predict(x_test)
```

`Model Evaluation:`

```
from sklearn.metrics import classification_report,accuracy_score,confusion_matrix
print(classification_report(y_test,y_pred))
```

```
print(confusion_matrix(y_test,y_pred))
[[1216 20]
[ 2 1200]]
print(accuracy_score(y_test,y_pred))
0.9909762100082035
```

`Conclusion:`

This concludes our notebook. We used a Naive Bayes classifier from the scratch, trained it on our dataset, and observed an impressive accuracy of 99% during testing.