Member-only story
In the former article, we talked about Hard Margin Support Vector Machines. In this article, we will discuss Soft Margin Support Vector Machines. We will discuss both the linear and non-linear cases. Since we will need to consider kernels in the case of non-linear SVM’s, it might be useful for you to read the following article first: Understanding the Kernel Trick. We will also see how SVMs are convex learning problems, which means you might also want to read the following article about Convex Learning Problems.
Remembering Hard Margin SVMs
Let us first refresh the intuition behind Hard Margin SVMs. We wanted to find a way to find the separating hyperplane that is the most robust to noise. This is done by finding the hyperplane with the biggest margin. We can look at two examples:
We can see that the hyperplane to the right has the fattest cushion, hence it is much more robust to error than the one to the left. We also remember that it is the point(s) closest to the hyperplane that determines the size of the margin. The points can be called support vectors. As it turned out, it is not always possible to find a hyperplane that linearly separates the data — in this case, we would use the Kernel Trick to find a higher dimension space where it is. In both cases, the optimal hyperplane…