Like every data scientist out there, I got my copy of Efron & Hastie's latest "Computer Age Statistical Inference" as soon as the first copy was off the press.

441 pages, 21 chapters, 21*21 = 441, though the average chapter is more like 15 pages of dense text. That's NOT a textbook, more like a "Guide", as the effusive praise on the covers document.

I only review Chapter 18 here. It is always of interest to see how statisticians perceive Deep Learning. Thankfully, they think it is important enough to warrant its own chapter :) Its a very nice 25 page chapter. The actual section on Deep Learning is only 7 pages! I don't know what to make of that. Most NIPS papers tend to be easily twice that size, and there are entire textbooks on preliminary material, say the three textbooks I regularly dip into - "Neural Network Design", "Neural Networks", "Neural Networks for Applied Science", each of which is easily 100x i.e. > 700 pages long! I harp on these trivialities perhaps only to convey that there is verbose, there is succint, there is concise, and then there is this magazine-footnote Cosmo style. You honestly cannot do justice to this material in 7 pages. But to give credit to the authors, they have done a whirlwind tour in these 7 precious pages. I tremendously enjoyed the highlights.

So the 25 page chapter is further broken down into introduction and 6 independent sections. Lets sample the delicacies.

Introduction: The authors claim NNs "shook up the statistics community" in the 1980s, the NN literature is very "colorful", the statistician's response was the "knee-jerk What's the big deal ?" dismissal, but NNs started "solving problems on a scale far exceeding what the statistics community was used to". This led to new journals and "several popular conferences at ski resorts". Then the NNs died a brief death in mid 1990s but "reemerged with a vengeance after 2010", and that's what's called Deep Learning.

Generally, you don't see textbooks taking swipes at whole domains, with flippant commentary as a bonus. So it was fun reading the intro. I think more books should be written in this sort of breezy fashion. It gets the reader firmly hooked.

As far as technical material goes, the intro had the usual feedforward network explained in a single line, with the associated equations.

So the 25 page chapter is further broken down into introduction and 6 independent sections. Lets sample the delicacies.

Introduction: The authors claim NNs "shook up the statistics community" in the 1980s, the NN literature is very "colorful", the statistician's response was the "knee-jerk What's the big deal ?" dismissal, but NNs started "solving problems on a scale far exceeding what the statistics community was used to". This led to new journals and "several popular conferences at ski resorts". Then the NNs died a brief death in mid 1990s but "reemerged with a vengeance after 2010", and that's what's called Deep Learning.

Generally, you don't see textbooks taking swipes at whole domains, with flippant commentary as a bonus. So it was fun reading the intro. I think more books should be written in this sort of breezy fashion. It gets the reader firmly hooked.

As far as technical material goes, the intro had the usual feedforward network explained in a single line, with the associated equations.

Two functions g & h. g is nonlinear, h is identity (for regression). The candidate for the nonlinear guy is a sigmoid, because "the idea was each neuron learns an on/off function. Sigmoids are a smooth and differentiable compromise". Again, gets to the point awfully fast.

The takeaway from the Intro is mostly that NNs are mostly "just a nonlinear model", but their influence is because of the topology ("they can be scaled up and generalized in a variety of ways...many units in a layer...many layers...weight sharing....colorful forms of regularization (again that word - colorful! ). They have found their "ideal niche", which is image classification & NLP. Their success is due to "massive improvements in computer resources". I imagine the ski resort conferences don't hurt.

The takeaway from the Intro is mostly that NNs are mostly "just a nonlinear model", but their influence is because of the topology ("they can be scaled up and generalized in a variety of ways...many units in a layer...many layers...weight sharing....colorful forms of regularization (again that word - colorful! ). They have found their "ideal niche", which is image classification & NLP. Their success is due to "massive improvements in computer resources". I imagine the ski resort conferences don't hurt.

Section 18.1: NNs & MNIST : Check back tomorrow....