Author: Sebastian Raschka , Yuxi (Hayden) Liu & Vahid Mirjalili
Date: February 2022
Audience: Python developers interested in machine learning
Reviewer: Mike James
This is a very big book of machine learning. Is it also good?
The simple answer is yes.
It isn’t a deeply theoretical book and if you are looking for deep theory you will need something else. This said there are plenty of equations in the book and if you are math-phobic you will find the going more difficult than you might like. Personally, I would say that if you can’t cope with the math as presented, you really aren’t going to cope with machine learning at all. This book strikes a good balance between presenting things understandably without ignoring the math.
This book is based on the third edition of Python Machine Learning which I reviewed back in 2020 and was pleased with. It has been upgraded to use PyTorch and SciKit-Learn and there is also a new chapter on transformers, currently a hot topic, including GPT and Bert.
One of the things that appealed to me about the 2020 book was that it covered machine learning with a sense of what had gone before and this new book reuses this material. It starts off from the very basics – the perceptron and its learning algorithm. It goes on to consider Adaline, logistic regression, support vector machines, decision trees and more. If you thought that machine learning was just about neural networks this might all come as a surpise. I hope its a pleasant surprise because there was a lot of machine learning before neural networks and there still is!
The book also covers the basics of data processing, which occupies far more time in machine learning than you might imagine. The problem of overfitting and approaches to solving it are also described including L1 and L2 regularization. You don’t just get the equations to understand what is going on you also get explanations and diagrams that help you understand what is going on.
From here the book moves on to dimensional reduction – principal components and discriminant analysis. These are also “classical” methods that are often overlooked when it comes to the current neural networks approach. Do you need to know these things? I have observed lots of cases where using dimensional reduction would have been useful had it been a known technique.
Whatever approach you use, evaluating and optimizing the model is important and how to do this is surprisingly complex. A whole chapter is devoted to evaluation and tuning. Improving performance via ensemble learning is the next topic and then comes a big example – sentiment analysis.
After this we have a chapter on regression analysis, which is really statics rather than machine learning, but if you don’t know about it you are likely to misapply more advanced methods when something simpler would do. From here we move to another classical statistical method – cluster analysis.
Half way through the book we finally reach the topic that the majority of readers are going to want to know about – implementing a multi-layer network. From the basics, using the MNIST database as an example, we move on to parallel implementation using PyTorch.
After the practical details of using PyTorch the book investigates different ways of using neural networks. The first specialization of the general neural network architecture is, of course, convolutional networks, then recursive networks for sequence data, transformers as an alternative to recursive networks, generative adverserial networks and finally graph neural networks. The final chapter is an all-too-short account of reinforcement learning and how neural networks can learn just by being given a reward for the actions they take.
As I said at the beginning, this is not a complete academic course in the machine learning branch of AI – it misses out a few things. In the main, however, it is a good and fair account of modern machine learning beyond just the topic of the neural network. It mostly hangs together as a coherent account and personally I think it represents what you should know – even if you are planning to specialize in neural networks.
I would have liked a little more philosophy of machine learning – differential programming, something about explainability, more on autoencoders, Boltzmann machines and the troublesome perturbations and adversarial inputs, but 700 pages is already a bit too long and what would you leave out?!
It is also very readable – apart from the problem of physically handling its 700 pages. You can dip into it if the idea of starting at the beginning and working to the end is daunting but I would suggest that a better approach is to start at the beginning and skim read the parts that don’t interest you yet – they probably will in time.
If you want a good grounding in the broader topic of machine learning then this is highly recommended.