Deep Learning vs Statistical Learning
Updated: Feb 16
The purpose of this blog is to discuss the distinction between deep learning and statistical learning models. Discussions about regulating algorithms tends to lump together deep learning and statistical learning models. However, these classes of algorithms are very different, and hence need to be regulated in different ways. The distinction between can inform us regarding the potential benefits and costs to each approach to machine learning, and in particular the role that these technologies play in enhancing decision making.
Deep learning is a relatively new and very exciting technology that can be viewed as a model of human decision making (LeCun, Bengio, and Hinton 2015), (Bengio, Lecun, and Hinton 2021). However, when it was first introduced it was very controversial for a number of reasons. The early models of artificial intelligence were motivated in part by Chomsky’s ideas on how to model human language. These early models assumed that intelligent decision making entailed learning rules that logically connect symbols (words) from which one can make logical inferences. That approach was not only a failure, but it has turned out that Chomsky’s theories of language has turned out to be wrong (Tomasello 2016).
The problem is that complex decision making is not logical! In the early AI models one attached precise meanings to words. As we all know from reading literature, the beauty of language is often due to its imprecision. The same word can evoke different meanings and reactions in individuals. What is amazing about deep learning models is precisely their ability to be imprecise and to allow words to have complex, varying interpretations and connections to many concepts.
This is achieved by constructing models that are built in layers with thousands of parameters. It has long been known that such a model is a universal function in the sense it can fit any arbitrary functional form. The problem is that within sample fitting of data does not imply that one can produce good out of sample predictions. For example, one can fit perfectly a picture of a street scene, but from that picture one cannot predict what will happen in the moments after the picture is taken.
Chomsky’s theory of a universal grammar was built upon this observation. He observed that humans are able to learn language with a relatively small number of examples. In order to do this he reasoned that there must be a scaffolding or basic structure inherent in the mind that allows such efficient learning. This is a very compelling argument. In the early days it was simply not clear how a deep learning model could learn from sparse data?
The breakthrough that occurred was the discovery that one can build structure into a learning model with “pre-learning”. For example, in technologies such as chat GPT, one can use a large corpus of text, such as those found in emails, to pre-train the model to see relationships in this data. When appropriately trained, the algorithm can learn new facts, say how to interpret contract terms, with a relatively small data set. In particular, words and language more generally do not have unique meanings, nor are they stored in a single location. As Hinton discusses in his NPR interview (see It's a Machine's World), it is better to see language and meaning like a hologram that is distributed over the who network. Holograms have the feature that if one cuts a holographic film in half, both halves still contain the full picture.
This has a number of interesting implications. One of them is that neural network models, like humans, are not good at logic. It has already been observed that chatGPT is not very good at mathematics (Zumbrun 2023). As Hinton points out, this observation is consistent with the hypothesis that the algorithms underlying chatGPT are human like. What chatGPT, and other algorithms like it, can do is aggregate large volumes of information, and discover interesting relationships in the data.
However, this comes at a cost. In particular, it implies that deep learning models, like humans, can be biased and make mistakes. This observation is crucial, because it suggests that like humans, these algorithms do need to be evaluated and regulated.
The question is how to do this? The problem is that the algorithm is not a simple decision rule with a small number of parameters. The fact that it is like a hologram means that correcting bias is not just a matter of changing a weight in an equation. Rather, like a human, the algorithm needs to be trained and under the hood thousands of parameters might have to be modified to reduce bias.
This brings us to statistical learning models (Hastie, Tibshirani, and Friedman 2009). It is worth emphasizing that this class of models are often viewed as part of “AI” and “machine learning”, but they are very different from deep learning models in several important ways. The first is that they are trained within a fixed and well defined data set. For example, the data might be worker survey data, and the question being asked is how does a college education contribute to future earnings. The model is estimated with the fixed data set from scratch.
An implication is that once a statistical model has been defined, then given the same data one will always get the same results. Complex AI programs, like chatGPT, are constantly learning and will give different answers over time.
Second, an explicit goal of a statistical learning model (which includes most of the models that economists use in their day to day work) is to measure the causal effect of a treatment. The measured effect might be biased, but this is a design choice when trading off precision against bias. In particular, statistical learning models are a mathematical tool that are designed to perform well where humans do not – take a large data set and measure the causal effect of a treatment, say predicing how much more one will make if one majors in computer science rather than English (Bleemer 2022).
This means that if one wishes to evaluate the bias of a deep learning model the only way to do this is using the tools economists have developed to measure casual effects and biases in decision making using statistical learning models (See in particular the 2021 Nobel prize citations for the work by Card, Angrist and Imbens). Thus, deep learning is not a replacement for statistical learning models. If anything as deep learning models evolve there will be an increasing need to evaluate their performance using statistical learning models that have been developed by researchers over many decades. I will return to this issue in future posts when discussing the role of micro-economics for understanding and evaluating the performance of deep learning algorithms.
Bengio, Yoshua, Yann Lecun, and Geoffrey Hinton. 2021. “Deep Learning for AI.” Communications of the ACM 64 (7): 58–65. https://doi.org/10.1145/3448250.
Bleemer, Zachary. 2022. “Affirmative Action, Mismatch, and Economic Mobility after California’s Proposition 209.” The Quarterly Journal of Economics 137 (1): 115–60. https://doi.org/10.1093/qje/qjab027.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning. New York, NY: Springer.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. “Deep Learning.” Nature 521 (7553): 436–44.
Tomasello, Paul Ibbotson, Michael. 2016. “Evidence Rebuts Chomsky’s Theory of Language Learning.” Scientific American. November 2016. https://doi.org/10.1038/scientificamerican1116-70.
Zumbrun, Josh. 2023. “ChatGPT Needs Some Help With Math Assignments.” Wall Street Journal, February 3, 2023, sec. US. https://www.wsj.com/articles/ai-bot-chatgpt-needs-some-help-with-math-assignments-11675390552.