Gradient-Boosting-Machines

Gradient Boosting Machines (GBM):

Gradient Boosting Machines (GBM) is a powerful ensemble learning technique used for both regression and classification problems. It builds models sequentially, each new model correcting the errors of the previous ones.

Here’s a step-by-step explanation of how Gradient Boosting Machines work:​

Step 1: Understand the Data

  • Features (X): The input variables or predictors.
  • Target Variable (Y): The output variable or label you want to predict.

Step 2: Initialize the Model

Start with a base model that makes an initial prediction. For regression, this is often the mean of the target values, and for classification, it’s the log odds of the class probabilities.

  • Initial Prediction (for regression):​F0(x)=1Ni=1NyiF_0(x) = \frac{1}{N} \sum_{i=1}^N y_i

  • where NN is the number of data points, and yiy_i is the target value for the ii-th data point.

  • Initial Prediction (for classification):F0(x)=logp1pF_0(x) = \log \frac{p}{1 - p}

  • where pp is the proportion of positive class in the training data.

Step 3: Compute Residuals

Calculate the residuals, which are the differences between the actual target values and the predictions made by the current model.

  • Residual Calculation:ri=yiFm1(xi)r_i = y_i - F_{m-1}(x_i)

  • where rir_i is the residual for the ii-th data point, yiy_i is the actual target value, and Fm1(xi)F_{m-1}(x_i) is the prediction from the previous model.

Step 4: Fit a New Model

Train a new model (often a decision tree) to predict these residuals. This model learns to correct the errors made by the previous model.

  • Model Fitting: Fit a model hm(x)h_m(x) to the residuals.

Step 5: Update the Model

Update the current model by adding the predictions from the newly trained model, scaled by a learning rate (also called the shrinkage parameter). This controls how much each new model contributes to the overall prediction.

  • Model Update Formula:Fm(x)=Fm1(x)+αhm(x)F_m(x) = F_{m-1}(x) + \alpha \cdot h_m(x)

  • where α\alpha is the learning rate, and hm(x)h_m(x) is the prediction from the new model.

Step 6: Repeat

Repeat Steps 3 to 5 for a specified number of iterations or until a stopping criterion is met (e.g., the improvement in residuals becomes minimal).

Step 7: Make Predictions

Once the ensemble of models is trained, use the final model FM(x)F_M(x) to make predictions on new data.

  • Prediction Formula:y^i=FM(xi)\hat{y}_i = F_M(x_i)

  • where y^i\hat{y}_i is the predicted value for the ii-th data point.

Step 8: Evaluate the Model

Assess the performance of the GBM model using appropriate metrics:

  • For Regression: Metrics like Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared.
  • For Classification: Metrics like Accuracy, Precision, Recall, F1-Score, and AUC-ROC.