Support-Vector-Machines

Support Vector Machines (SVM):

Support Vector Machines (SVM) are powerful supervised learning models used for classification and regression tasks. They work by finding a hyperplane that best separates the classes in a dataset.

Here’s a step-by-step explanation of how Support Vector Machines (SVM)​ works:​

Step 1: Understand the Data

You need a dataset with:

  • Features (X): The input variables or predictors.
  • Target (Y): The output variable or label you want to predict. In classification, these are typically categorical labels.

Step 2: Linear SVM (For Linearly Separable Data)

If the data can be separated perfectly by a straight line (in 2D) or a hyperplane (in higher dimensions), SVM aims to find the optimal separating line (or hyperplane).

  1. Hyperplane: A hyperplane is a decision boundary that separates different classes. In a 2D space, this is just a line, but in higher dimensions, it becomes a plane or a more complex structure.

  2. Maximize the Margin: SVM aims to find the hyperplane that maximizes the distance (margin) between the two classes. The wider the margin, the better the generalization.​Margin=2wMargin = \frac{2}{||w||}

  3. Here, ww is the vector of hyperplane coefficients.

  4. Support Vectors: The data points that are closest to the hyperplane are called support vectors. These points define the margin, and the goal of SVM is to maximize this margin.

Step 3: Handle Non-linearly Separable Data (Kernel Trick)

When the data is not linearly separable, SVM uses something called the kernel trick to project the data into a higher-dimensional space where it becomes linearly separable. This projection allows SVM to create more complex decision boundaries.

  • Popular Kernel Functions:

    • Linear Kernel: Used when data is linearly separable.
    • Polynomial Kernel: Allows for curved boundaries.
    • Radial Basis Function (RBF) or Gaussian Kernel: Handles very complex relationships by mapping data to an infinite-dimensional space.

K(x1,x2)=exp(γx1x22)K(x_1, x_2) = \exp\left(-\gamma ||x_1 - x_2||^2\right)

  • The kernel trick allows SVM to find the optimal hyperplane in this higher-dimensional space without explicitly transforming the data.

Step 4: Soft Margin (Handle Overlapping Classes)

Real-world data often has some overlap between classes. SVM handles this by introducing a soft margin, allowing some points to be within the margin or even on the wrong side of the hyperplane.

This is controlled by a parameter CC, which balances the trade-off between maximizing the margin and minimizing classification errors.

  • Small C: Larger margin but more misclassifications.
  • Large C: Smaller margin with fewer misclassifications, but higher risk of overfitting.

Step 5: Optimization Problem

SVM solves a constrained optimization problem to find the optimal hyperplane:

  • Objective: Minimize 12w2\frac{1}{2} ||w||^2 (i.e., maximize the margin) while correctly classifying the data points.
  • Subject to: Constraints that ensure the data points are classified correctly (or within the margin if using a soft margin).

The solution involves solving a quadratic programming problem, and modern algorithms like Sequential Minimal Optimization (SMO) are used to efficiently handle this.

Step 6: Predicting with SVM

Once the model is trained, predictions are made by determining which side of the hyperplane a new data point falls on. For classification tasks:

  • If the point falls on one side of the hyperplane, it is assigned to Class A.
  • If it falls on the other side, it is assigned to Class B.

The decision rule is based on the sign of:

f(x)=wx+bf(x) = w \cdot x + b

Where:

  • wxw \cdot x is the dot product of the feature vector and the hyperplane's weight vector.
  • bb is the bias term.

For regression tasks, SVM works similarly, but instead of classification boundaries, it seeks to minimize prediction errors within a specified margin.

Step 7: Evaluate the Model

SVM performance can be evaluated using:

  • Accuracy: Proportion of correctly classified instances (for classification).
  • Confusion Matrix: For classification tasks, provides true positives, false positives, etc.
  • Mean Absolute Error (MAE): For regression tasks.
  • Cross-Validation: A common approach to validate how well the model generalizes to unseen data.