Here’s a step-by-step explanation of how Support Vector Machines (SVM) works:
Step 1: Understand the Data
You need a dataset with:
- Features (X): The input variables or predictors.
- Target (Y): The output variable or label you want to predict. In classification, these are typically categorical labels.
Step 2: Linear SVM (For Linearly Separable Data)
If the data can be separated perfectly by a straight line (in 2D) or a hyperplane (in higher dimensions), SVM aims to find the optimal separating line (or hyperplane).
-
Hyperplane: A hyperplane is a decision boundary that separates different classes. In a 2D space, this is just a line, but in higher dimensions, it becomes a plane or a more complex structure.
-
Maximize the Margin: SVM aims to find the hyperplane that maximizes the distance (margin) between the two classes. The wider the margin, the better the generalization.
-
Here, w is the vector of hyperplane coefficients.
-
Support Vectors: The data points that are closest to the hyperplane are called support vectors. These points define the margin, and the goal of SVM is to maximize this margin.
Step 3: Handle Non-linearly Separable Data (Kernel Trick)
When the data is not linearly separable, SVM uses something called the kernel trick to project the data into a higher-dimensional space where it becomes linearly separable. This projection allows SVM to create more complex decision boundaries.
Step 4: Soft Margin (Handle Overlapping Classes)
Real-world data often has some overlap between classes. SVM handles this by introducing a soft margin, allowing some points to be within the margin or even on the wrong side of the hyperplane.
This is controlled by a parameter C, which balances the trade-off between maximizing the margin and minimizing classification errors.
- Small C: Larger margin but more misclassifications.
- Large C: Smaller margin with fewer misclassifications, but higher risk of overfitting.
Step 5: Optimization Problem
SVM solves a constrained optimization problem to find the optimal hyperplane:
- Objective: Minimize 21∣∣w∣∣2 (i.e., maximize the margin) while correctly classifying the data points.
- Subject to: Constraints that ensure the data points are classified correctly (or within the margin if using a soft margin).
The solution involves solving a quadratic programming problem, and modern algorithms like Sequential Minimal Optimization (SMO) are used to efficiently handle this.
Step 6: Predicting with SVM
Once the model is trained, predictions are made by determining which side of the hyperplane a new data point falls on. For classification tasks:
- If the point falls on one side of the hyperplane, it is assigned to Class A.
- If it falls on the other side, it is assigned to Class B.
The decision rule is based on the sign of:
Where:
- w⋅x is the dot product of the feature vector and the hyperplane's weight vector.
- b is the bias term.
For regression tasks, SVM works similarly, but instead of classification boundaries, it seeks to minimize prediction errors within a specified margin.
Step 7: Evaluate the Model
SVM performance can be evaluated using:
- Accuracy: Proportion of correctly classified instances (for classification).
- Confusion Matrix: For classification tasks, provides true positives, false positives, etc.
- Mean Absolute Error (MAE): For regression tasks.
- Cross-Validation: A common approach to validate how well the model generalizes to unseen data.