Naive Bayes:

Naive Bayes is a probabilistic classifier based on Bayes' Theorem with an assumption of independence between features. It’s often used for text classification, spam detection, and other classification problems.

Here’s a step-by-step explanation of how Naive Bayes works:

Step 1: Understand the Data

Features (X): The input variables or predictors.
Target Variable (Y): The output variable or class label you want to predict.

Step 2: Define Bayes’ Theorem

Naive Bayes is based on Bayes’ Theorem, which describes the probability of a class given the features. The formula is:

$P(Y \mid X) = \frac{P(X \mid Y) \cdot P(Y)}{P(X)}$

Where:

$P(Y \mid X)$ $P (Y ∣ X)$ is the posterior probability of class $Y$ $Y$ given features $X$ $X$ .
$P(X \mid Y)$ $P (X ∣ Y)$ is the likelihood of features $X$ $X$ given class $Y$ $Y$ .
$P(Y)$ $P (Y)$ is the prior probability of class $Y$ $Y$ .
$P(X)$ $P (X)$ is the marginal likelihood of features $X$ $X$ .

Step 3: Apply the Naive Assumption

The "naive" assumption is that all features are independent given the class. This simplifies the likelihood calculation:

$P(X \mid Y) = P(X_1, X_2, \ldots, X_n \mid Y) = \prod_{i=1}^{n} P(X_i \mid Y)$

Here, $X_i$ $X_{i}$ are individual features. This assumption makes computations manageable by reducing the complexity of calculating joint probabilities.

Step 4: Calculate Prior Probabilities

Estimate the prior probability of each class $P(Y)$ $P (Y)$ . This is the probability of each class occurring in the dataset:

$P(Y = y) = \frac{\text{Number of instances in class } y}{\text{Total number of instances}}$

Step 5: Calculate Likelihoods

Estimate the likelihood of each feature given each class $P(X_i \mid Y)$ $P (X_{i} ∣ Y)$ . This depends on the type of feature:

For categorical features: Use frequency counts. For feature $X_i$ $X_{i}$ in class $Y$ $Y$ : $P(X_i = x_i \mid Y = y) = \frac{\text{Number of instances where } X_i = x_i \text{ and } Y = y}{\text{Number of instances in class } y}$
For continuous features: Assume a distribution (often Gaussian). For feature $X_i$ $X_{i}$ in class $Y$ $Y$ , use: $P(X_i = x_i \mid Y = y) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x_i - \mu)^2}{2 \sigma^2} \right)$
where $\mu$ $μ$ and $\sigma^2$ $σ^{2}$ are the mean and variance of $X_i$ $X_{i}$ for class $Y$ $Y$ .

Step 6: Make Predictions

To classify a new instance, calculate the posterior probability for each class using Bayes’ Theorem and the naive assumption:

$P(Y = y \mid X) \propto P(Y = y) \cdot \prod_{i=1}^{n} P(X_i \mid Y = y)$

Choose the class with the highest posterior probability:

$\hat{Y} = \arg\max_{y} \left( P(Y = y) \cdot \prod_{i=1}^{n} P(X_i \mid Y = y) \right)$

Step 7: Evaluate the Model

Assess the performance of the Naive Bayes classifier using metrics such as:

Accuracy: The proportion of correctly classified instances.
Confusion Matrix: Provides a detailed breakdown of classification results.
Precision, Recall, F1-Score: Evaluate the classifier's performance for different classes, especially in imbalanced datasets.