R code to accompany Real-World Machine Learning (Chapter 4)
TweetAbstract
In the latest update to the rwml-R Github repo, I provide R code to accompany Chapter 4 of the book “Real-World Machine Learning” by Henrik Brink, Joseph W. Richards, and Mark Fetherolf. Topics covered include optimization of model parameters via grid search with caret
, plotting a confusion matrix with ggplot2
, and generating ROC curves with ROCR
. This blog post provides a summary and some examples of the code contained in the update.
rwml-R project pages posted
For convenience, I’ve created a project page for rwml-R to post
the generated HTML files
from knitr
. This (and Chapter 2 and Chapter 3) blog posts
are short
summaries of the R code provided in the rwml-R project.
Also, feel free to fork the rwml-R repo
and submit a pull request if you wish to contribute.
Plotting a confusion matrix
The MNIST dataset of handwritten digits makes another appearance.
The kknn
package is again used, and the confusion matrix is plotted
using ggplot2
. The color scale for the plot is generated using
the RColorBrewer
package.
Plotting a series of ROC curves
The ROCR
package is introduced and used to generate ROC curves.
Also, AUC values are calculated for each curve and displayed along with
each of the curves.
Tuning model parameters
The caret
package is used to tune parameters via grid search
for the Support Vector Machines model with a Radial Basis Function Kernel.
By setting summaryFunction = twoClassSummary
in trainControl
, the ROC
curve is used to select the optimal
model. The doMC
package is also introduced for parallel computation.
Feedback welcome
If you have any feedback on the rwml-R project, please leave a comment below or use the Tweet button. Again, feel free to fork the rwml-R repo and submit a pull request if you wish to contribute.