Software for parameter tuning for SVM classifiers

Carl Gold (Caltech) and Peter Sollich (King's College London)


The problem

Support Vector Machines (SVM) have established themselves as standard and powerful tools for many machine learning and pattern recognition applications, and in particular classification tasks.

Nevertheless, like any learning algorithm SVMs have a number of tunable parameters; these include the penalty parameter C as well as any other parameters that the kernel might depend on, e.g. the kernel width for radial basis function (RBF) kernels. We call these hyperparameters, to distinguish them from the lower-level parameters which the algorithm fits, i.e. weight vector and offset.

If there are only one or two hyperparameters, one can certainly try a direct minimization of test error - as measured e.g. by cross-validation - over a grid of hyperparameter values. But this becomes impractical when many hyperparameters are involved. For example, in an RBF kernel one might want to allow a separate width parameter for each input dimension (or feature). Optimizing over these widths amounts to automatic relevance determination (ARD) since large width parameters indicate that the feature concerned has little effect on the kernel and hence on prediction performance.


Our solution

We interpret the SVM algorithm as the maximum a posteriori solution to a Bayesian inference problem. It is then natural to select hyperparameters to maximize the evidence, i.e. the overall likelihood of the observed data. The key advantage is that the evidence is a continuous function of the hyperparameters, and so can be optimized by e.g. gradient ascent. We have tested this method on a number of standard data sets and found very encouraging results. For details and background references, see the papers on SVMs in my publications list.


Software

We have written software to automate the tuning of all hyperparameters for SVM classifiers with the popular RBF kernels, extended to allow ARD. Evidence gradients are estimated by sampling from the Bayesian posterior, and this is speeded up by a Nystrom approxmation which reduces the dimensionality of the space that needs to be sampled. You can download the complete software bundle free from here as long as it's for research and education use; unpack it with gzip and tar on Unix, or WinZip or similar on Windows. If you want further information about the software before downloading, have a look at the user's guide.


Feedback, extensions

If you experience difficulty with installing and running the software, do email me, but make sure you've consulted the user's guide first. We would be pleased to hear about results you get with the method. Similarly, if you notice bugs, let me know.

The approach should also extend straightforwardly to SVM regression rather than classification. We may implement this in the future, or if you're interested and would like to collaborate on this, get in touch.


Click here to access other sites in the Department of Mathematics and at King's College London:
Home King's College Search Comments


Last updated 3 Aug 2005
Contact: Peter Sollich