**Support Vector Machines** (SVM) have gained huge popularity in recent years. The reason is their robust classification performance – even in high-dimensional spaces: SVMs even work if there are more dimensions (features) than data items. This is unusual for classification algorithms because of the *curse of dimensionality* – with increasing dimensionality, data becomes extremely sparse which makes it hard for algorithms to find patterns in the data set.

Understanding the basic ideas of SVMs is a fundamental step to becoming a ** sophisticated machine learning engineer**.

## SVM Video

Feel free to watch the following video that summarizes shortly how SVMs work in Python:

## SVM Cheat Sheet

Here is a cheat sheet that summarizes the content of this article:

You can get this cheat sheet—along with additional Python cheat sheets—as a high-resolution PDFs here:

Let’s get a conceptual of support vector machines first before learning how to use them with `sklearn`

.

## Machine Learning Classification Overview

How do classification algorithms work? They use the training data to find a decision boundary that divides data in the one class from data in the other class.

Here is an example:

Suppose, you want to build a ** recommendation system** for aspiring university students. The figure visualizes the training data consisting of users that are classified according to their skills in two areas:

**and**

*logic***. Some persons have high logic skills and relatively low creativity, others have high creativity and relatively low logic skills. The first group is labeled as**

*creativity**“computer scientists”*and the second group is labeled as

*“artists”*. (I know that there are also creative computer scientists, but let’s stick with this example for a moment.)

In order to classify new users, the machine learning model must find a ** decision boundary** that separates the computer scientists from the artists. Roughly speaking, you will check for a new user in which area they fall with respect to the decision boundary: left or right? Users that fall into the left area are classified as computer scientists, while users that fall into the right area are classified as artists.

In the two-dimensional space, the decision boundary is either a line or a (higher-order) curve. The former is called a * “linear classifier”,* the latter is called a

**. In this section, we will only explore linear classifiers.**

*“non-linear classifier”*The figure shows three decision boundaries that are all valid separators of the data. For a standard classifier, it is impossible to quantify which of the given decision boundaries is better – they all lead to perfect accuracy when classifying the training data.

## Support Vector Machine Classification Overview

*But what is the best decision boundary?*

Support vector machines provide a unique and beautiful answer to this question. Arguably, the best decision boundary provides a maximal margin of safety. In other words, SVMs ** maximize the distance between the closest data points and the decision boundary**. The idea is to minimize the error of new points that are close to the decision boundary.

Here is an example:

The SVM classifier finds the respective support vectors so that the zone between the different support vectors is ** as thick as possible**. The decision boundary is the line in the middle with maximal distance to the support vectors. Because the zone between the support vectors and the decision boundary is maximized, the

**when classifying new data points. This idea shows high classification accuracy for many practical problems.**

*margin of safety is expected to be maximal*## Scikit-Learn SVM Code

Let’s have a look how the `sklearn`

library provides a simple means for you to use SVM classification on your own labeled data. I highlighted the sklearn relevant lines in the following code snippet:

## Dependencies from sklearn import svm import numpy as np ## Data: student scores in (math, language, creativity) --> study field X = np.array([[9, 5, 6, "computer science"], [10, 1, 2, "computer science"], [1, 8, 1, "literature"], [4, 9, 3, "literature"], [0, 1, 10, "art"], [5, 7, 9, "art"]]) ## One-liner svm = svm.SVC().fit(X[:,:-1], X[:,-1]) ## Result & puzzle student_0 = svm.predict([[3, 3, 6]]) print(student_0) student_1 = svm.predict([[8, 1, 1]]) print(student_1)

**Guess**: what is the output of this code?

The code breaks down how you can use support vector machines in Python in its most basic form. The NumPy array holds the labeled training data with one row per user and one column per feature (skill level in maths, language, and creativity). The last column is the label (the class).

Because we have three-dimensional data, the support vector machine separates the data using ** two-dimensional planes** (the linear separator) rather than one-dimensional lines. As you can see, it is also possible to separate three different classes rather than only two as shown in the examples above.

The one-liner itself is straightforward: you first create the model using the constructor of the `svm.SVC`

class (*SVC* stands for *support vector classification*). Then, you call the `fit`

function to perform the training based on your labeled training data.

In the results part of the code snippet, we simply call the `predict`

function on new observations:

- Because
`student_0`

has skills`maths=3`

,`language=3`

, and`creativity=6`

, the support vector machine predicts that the labelfits this student’s skills.*“art”* - Similarly,
`student_1`

has skills`maths=8`

,`language=1`

, and`creativity=1`

. Thus, the support vector machine predicts that the labelfits this student’s skills.*“computer science”*

Here is the final output of the one-liner:

## Result & puzzle student_0 = svm.predict([[3, 3, 6]]) print(student_0) # ['art'] student_1 = svm.predict([[8, 1, 1]]) print(student_1) ## ['computer science']

## Where to Go From Here?

This tutorial provides you the quickest and most concise way of starting out with support vector machines (SVMs). You won’t find any easier way on the whole Internet.

In fact, I wrote this as a chapter draft for my book * Python One-Liners* that also introduces 10 machine learning algorithms, and how to use them in a single line of Python code.

Here’s more about the book:

## Python One-Liners Book: Master the Single Line First!

**Python programmers will improve their computer science skills with these useful one-liners.**

*Python One-Liners* will teach you how to read and write “one-liners”: ** concise statements of useful functionality packed into a single line of code. **You’ll learn how to systematically unpack and understand any line of Python code, and write eloquent, powerfully compressed Python like an expert.

The book’s five chapters cover (1) tips and tricks, (2) regular expressions, (3) machine learning, (4) core data science topics, and (5) useful algorithms.

Detailed explanations of one-liners introduce ** key computer science concepts **and

**. You’ll learn about advanced Python features such as**

*boost your coding and analytical skills**,*

**list comprehension****,**

*slicing***,**

*lambda functions***,**

*regular expressions***and**

*map***functions, and**

*reduce***.**

*slice assignments*You’ll also learn how to:

- Leverage data structures to
**solve real-world problems**, like using Boolean indexing to find cities with above-average pollution - Use
**NumPy basics**such as*array*,*shape*,*axis*,*type*,*broadcasting*,*advanced indexing*,*slicing*,*sorting*,*searching*,*aggregating*, and*statistics* - Calculate basic
**statistics**of multidimensional data arrays and the K-Means algorithms for unsupervised learning - Create more
**advanced regular expressions**using*grouping*and*named groups*,*negative lookaheads*,*escaped characters*,*whitespaces, character sets*(and*negative characters sets*), and*greedy/nongreedy operators* - Understand a wide range of
**computer science topics**, including*anagrams*,*palindromes*,*supersets*,*permutations*,*factorials*,*prime numbers*,*Fibonacci*numbers,*obfuscation*,*searching*, and*algorithmic sorting*

By the end of the book, you’ll know how to ** write Python at its most refined**, and create concise, beautiful pieces of “Python art” in merely a single line.

*Get your Python One-Liners on Amazon!!*

While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.

To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.

His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.