Abstract

I updated the R code to accompany Chapter 2-4 of the book “Real-World Machine Learning” by Henrik Brink, Joseph W. Richards, and Mark Fetherolf to be more consistent with the listings and figures as presented in the book.

rwml-R Chapters 2-4 updated

The most notable changes to rwml-R are for Chapter 4, where multiple ROC curves are plotted for a 10-class classifier and a tile plot is generated for a tuning parameter grid search. Also, for parallel computations, the doMC package was replaced with doParallel.

Plotting a series of ROC curves

To be consistent with the approach followed in the book, I’ve added listings of R code to compute the ROC curves and AUC values “from scratch” instead of using the ROCR package as was done previously:

Tuning model parameters in Chapter 4

The caret package is used to tune parameters via grid search for the Support Vector Machines model with a Radial Basis Function Kernel. By setting summaryFunction = twoClassSummary in trainControl, the ROC curve is used to select the optimal model. For consistency with the book, tile plots were added to illustrate the process of refining the grid for the parameter search. The tile plot for the second (refined) grid search is below.

Tile plot for parameter search

Feedback welcome

If you have any feedback on the rwml-R project, please leave a comment below or use the Tweet button. As with any of my projects, feel free to fork the rwml-R repo and submit a pull request if you wish to contribute. For convenience, I’ve created a project page for rwml-R with the generated HTML files from knitr.

Download Fork