Empiricism and Scientific Change in Judea Pearl’s ‘The Book of Why’

17 min readMar 30, 2021

Edit: A brief response from Prof. Pearl to this blog post may be found at https://twitter.com/yudapearl/status/1377127158926041089.

Several years ago, as a newly-minted college freshman, I learned how the eighteenth century empiricist philosopher David Hume showed that causation isn’t logically necessary. Just because one billiard ball hitting another caused the second one to move every single time in the past doesn’t mean that it is absolutely guaranteed that it will happen again. Indeed, our general notion of causation might merely be a product of habit. This revelation impressed me greatly as I saw that a basic belief of ours might not be built on absolute foundations and many concepts that we think are obvious are not guaranteed to strictly exist. Since Hume’s time, science became parsimonious with the notions it used to explain the world around us, discarding concepts such as the ether and phlogiston as it built a culture centered around empirical observation and experimentation. Could it however be possible that a conceptual asceticism that has honed the scientific worldview is now holding it back? Could an apathy towards causation that served us well from Hume’s time now be impeding fields as diverse as public health and artificial intelligence? The Book of Why, published in 2019, recounts a shift in the academic treatment of causation and evokes some long-standing questions about the interplay of philosophy and science.

Judea Pearl is a professor at UCLA and a recipient of the prestigious 2011 Turing Prize for his work in computer science. After making seminal contributions to the field of artificial intelligence with his work on bayesian networks, Pearl turned to causal reasoning and, in recent years, has focused on modeling causal systems for use across computer science, social science and statistics. The Book of Why, written in collaboration with Dana Mackenzie, summarizes the work Pearl has done over his career. The empiricist backdrop of science might have played a profoundly important role in scientific thinking but, if we go by Judea Pearl’s account of causation in the world of mathematical statistics, a suspicion of causation and an obsession with staying purely within the confines of observational data might be actively hurting a variety of scientific disciples and the way we make inferences in the age of Big Data.

In his book, Pearl uses a metaphorical Ladder of Causation to illustrate his overall thinking. The first rung, i.e. association, involves observing patterns and regularities, i.e. “what if I see..” questions. Relevant examples might include animals being conditioned to expect food at the ringing of a bell or a compulsively clumsy electrician associating a specific odor with an inevitable fire. The second rung, ie. intervention, involves figuring out the impact of specific actions, i.e. “what if I do” questions. Randomized control trials in medicine provide the most pervasive examples since they measure the effect of specific medical interventions on human or animal populations. Likewise, A/B tests run by technology companies on their users measure the comparative impact of specific changes in application behavior. Rung one techniques generally do not work on rung two because the collected data is derived from very particular circumstances which might not hold under a given intervention. For example, the impact of doubling the price of an item might vary depending on whether there is widespread ongoing inflation or a general shortage of the product. The final rung, i.e. counterfactuals, involves figuring out mechanisms, allowing us to reason about outcomes had conditions or actions been different from what they actually were, i.e. “what if I had done” questions. This is the arena of scientific discovery, ethical reasoning and, to a great extent, everyday living in a complex and dynamic world. Rung two techniques may not work on rung three because certain tasks, such as talking about particular individuals or cases (“would I personally have been better off skipping grad school”) or capturing indirect effects from mediators (“Does aspirin prevent strokes by thinning blood?”), requires working explicitly with counterfactuals.

Existing methods in statistics and computer science mostly operate on the first rung. Standard machine learning techniques infer patterns in sets of data or produce fresh data patterned after existing data. There is usually no facility for simulating interventions. Quants at a hedge fund might be able to successfully model an association between the stock prices of related companies. However, since their model will have no built-in notion of causation, they will find it difficult to predict the impact on the wider market of setting the price of an individual stock to double its value. The doubling could affect the market through multiple mechanisms and each could have different ramifications for the rest of the market. In general, a tremendous amount of computing power can be thrown at massive sets of data these days to uncover insights using standardized tools. However, Pearl argues that by simply restricting ourselves to first rung methods which do not easily incorporate qualitative expert information about causal connections, we are missing out on deeper insights, some of which could revolutionize fields like medicine.

Pearl provides some historical background on why statistics and derived fields have been stuck on rung one for the past century and a half. Sewall Wright, apparently somewhat of a personal hero to Pearl, was an influential figure in statistics and invented a technique called path analysis in the 1920s which was a precursor of sorts to Pearl’s own work. Path analysis used diagrams such as this one about gestation in guinea pigs to incorporate causal information into statistical analysis:

According to Pearl, Wright was vehemently opposed by the statistical establishment. There is a direct lineage between Fracis Galton, Karl Pearson and R.A Fisher, the pioneering giants of statistics who came up with some of the seminal concepts of the field, all of whom explicitly disavowed causation in science. Pearl quotes Pearson as saying that, “it was Galton who first freed me from the prejudice that sound mathematics could only be applied to natural phenomena under the category of causation” and claims Pearson saw causation merely as a special case of correlation. Pearson’s student, Henry Niles, savaged Wright’s work and path analysis remained generally obscure for several decades. R.A. Fisher himself supposedly saw statistics as “methods of..reduction of data’’ and engaged in a bitter professional rivalry with Wright. Both Pearson and Fisher are portrayed as domineering, acrimonious and intolerant of competing viewpoints. Pearl implies that, as leaders of their field, they were able to suppress Wright’s and others’ efforts to bring causal techniques into statistics.

In the absence of an explicit account by Pearl, we can speculate about the philosophical milieu in which prominent scientists such as Fisher were fiercely skeptical of causal methods. Both the philosophy and practice of science have been affected by the empiricist tradition that Hume belonged to. Scientific knowledge relies heavily on observational data and avoids using metaphysical concepts that cannot ultimately be measured in some way. Indeed, as science progressed, it mostly shed presumed concepts like phlogiston and the ether, replacing them with concrete laws of nature and explanations based on rigorous observations. In a tradition so steeped in the removal of the extraneous, a seemingly nebulous concept such as causation is likely to be treated with suspicion. Indeed, causation is difficult to both define and describe. There are multiple accounts of causation, each with their own pros and cons. There are notable counterexamples for each account and boundary cases which prevent any of them from being truly exhaustive. Without an ideal normative account of causation, we are often left with descriptive or conventional accounts for particular domains such as the law or medicine. For most of its short history, science faced opposition from incumbent belief systems steeped in tradition, mysticism and arbitrary authority. It is possible that the culture of skepticism and rigor that science had to thus adopt biased it against methods which may rely on something as seemingly arcane as causation.

For decades after Hume’s works were published, scientific and philosophical thought were in a sort of crisis. Hume’s broader account of the Problem of Induction made most of our knowledge suspect and threatened to undermine science. Similar to the example of the billiard balls, there are numerous scenarios where past experience is no logically necessary indicator of future expectations. Just because the sun rises every day and all ravens ever seen have been black doesn’t mean that tomorrow the sun won’t rise and a white raven won’t be seen. It took an intervention by the philosopher Immanuel Kant to generally quell the crisis with causation simply assumed to be part of our fundamental experience of reality. Without making this assumption, Kant’s cohort realized, we simply cannot process all the sense experience coming our way. Pure observation alone cannot bring us usable knowledge. In a way, Pearl makes a move similar to Kant’s by presupposing causation and using the concept to frame and mediate our observed data. By stepping out of the pure processing of data and using domain knowledge to represent and handle data, Pearl contends, we unlock the ability to make entire classes of inference we have been suppressing so far by instead incorporating qualitative causal information into our overwhelmingly quantitative models.

Pearl’s causal calculus consists of causal diagrams and a symbolic language, the former for representing what we know and the latter for what we want to know. He uses a classic example from the study of causation to illustrate his approach to the three rungs of causation. In this scenario, a firing squad of two soldiers led by a captain shoots at a prisoner to execute them upon receiving a court order. This is the causal diagram representing our knowledge of the situation:

Each arrow above represents a causal connection and points from cause to effect. The dots in this case represent binary variables representing the state of each actor in the system. From this diagram, we can trivially reason that a negative court order cannot result in death or that the captain will always give the order to shoot when instructed by the court.

If we ascend to the second rung, we can reason about the outcome when one of the two soldiers, A, decides unilaterally to pull the trigger of their gun. This intervention can be trivially represented by erasing all incoming arrows into A and forcing its value to be true. Now, regardless of the actions of the court, the Captain or the other soldier, we know the prisoner will die. Even a simple diagram such as this lets us exceed what we can do by purely seeing, i.e. collecting and training on data.

The act of erasing incoming arrows above and setting a variable to a specific value is an application of the do-operator. This mechanism is used to represent rung two interventions in the causal calculus. If, instead of using boolean variables, we had used probabilistic variables in our causal diagram, variables with incoming arrows would represent conditional probabilities such as P(D | A, B), i.e. probability of D given values of A and B, while an application of the do-operator would result in probabilities of the form P(D | do(A). B ). Pearl further describes a do-calculus which consists of a few general rules to transform any expression containing do-operators into an expression containing only conditional probabilities. This is tantamount to mathematically reducing an intervention into a statement purely on observed data.

The do-operator can also be used to handle confounding in data drawn from various situations. Quite simply, if we observe in any part of a causal diagram that P( E | C ) is not the same as P( E | do(C) ), we know we have a confounder at work, i.e. there is something other than the putative cause which has a bearing on the effect. In fact, there are some recurring structures as M-junctions ( e.g. A -> C <- B ) which, in conjunction with heuristics known as front door and back door adjustments, let us identify and control for confounders in experiments, observational studies, etc. Pearl demonstrates the power and versatility of these techniques by going through a retinue of paradoxes, such as Simpson’s Paradox, and showing how they can be clarified using his methods.

To ascend to the top rung of causation, we must ask a counterfactual question. Let’s say the prisoner is already known to be dead. What if we want to know whether they would have survived had the first soldier decided not to shoot? Again, we perform a slight surgery on our causal diagram and set the values of the variables according to the set scenario, i.e. A to false and Court Order to true, etc. With some straightforward reasoning, we can infer that the prisoner would have died in spite of the first soldier’s conscientious act.

Using the causal calculus, we can start with a qualitative model which reflects our causal beliefs about a system and end by evaluating a mathematical expression based on the quantitative data available to us. Merely looking at data from hundreds of court orders relating to executions does not let us reason about an intervention such as appealing to A’s conscience to not pull the trigger. This is useful not just for exemplar problems but also for real world problems where direct interventions may not be possible. Pearl describes the history of the effort to conclusively link smoking with lung cancer where, for ethical and practical reasons, no randomized control trial was possible. Using only observational data, experts took decades to come to a consensus on the link and this too based on the multifaceted Hill’s Criteria. In a setting such as this, access to causal methods might become a literal lifesaver.

In his day, the paragon of the field of statistics, R.A. Fisher, strongly resisted the notions of causality Sewall Wright tried to introduce through his path diagrams. While there isn’t such a towering luminary now to squelch Pearl’s efforts, Pearl does complain repeatedly of opposition from the statistics community at large. A good example of this might be the often cantankerous, sometimes outright hostile but overall polite series of disagreements between him and Andrew Gelman, a professor of statistics at Columbia University. Gelman’s blog post review of Pearl’s book and their ensuing back and forth in the post’s comments section might vaguely be encapsulated thus:

Pearl: Statistics has historically avoided talking about causation.

Gelman: We talk about causation all the time!

Pearl: But you don’t have standardized causal techniques and terminology that work across domains! It’s all very ad-hoc and euphemistic and you don’t systematize anything.

Gelman: Yes, because each domain is different and it’s not possible to have a general purpose technique.

Pearl: I literally created a general purpose technique!

Gelman: Yes, but it’s trivial and only solves toy problems.

Pearl: You can’t even solve the toy problems I put forth. No statistician can solve them with their conventional tools and techniques.

Gelman: Your toy problems are very uninteresting to me and have little to do with serious causation work we do across domains.

Pearl: So you think I’m trite and even refuse to engage with me?

Gelman: Agree to disagree.

The debate between Pearl and Gelman and the hostility that Pearl feels might have to do with the concept of incommensurability. Thomas Kuhn came up with a theory of scientific change which involved a series of paradigm shifts in scientific fields, each triggered by a crisis in a predecessor. A straightforward example would be the shift between classical and quantum physics where observational anomalies in the nineteenth century led to the creation of the latter field in the early twentieth century. This is not to say there is always a clear linear relationship between paradigms. Regardless, the exemplars, methods, tools and standards of competing paradigms can be radically different, i.e. incommensurable. In fact, it might be impossible to judge one paradigm from within the context of another. While this concept of scientific paradigms isn’t conclusive and has been challenged vigorously by competitors from both philosophy and sociology, it can still be useful for evaluating scientific change. Pearl and Gelman might be at loggerheads not because they disagree in particular but because they simply belong to different paradigms. Their arguments, while valid within their own paradigms or research programs, simply cannot be evaluated within the other’s. Quite possibly the two aren’t simply talking past each other.

There is something grand and exhilarating that transcends Pearl’s obsessions with causal graphs and quibbles with statistical orthodoxy. The ultimate motivator for all of this work and its aspirational end is the Causal Revolution, a grand effort to incorporate causation into daily scientific practice. The loftiest goal is to profoundly affect artificial intelligence itself to create sentient machines with moral and scientific capabilities and even free will. It is important to note that Pearl’s background is in an older effort to realize ‘hard A.I.’, in contrast to the modern focus on using applied statistics to bring specific capabilities to computers. The former focused on creating general-purpose thinking machines, often relying on domain knowledge from human experts while the latter mostly operates purely on data. In essence, Pearl wants to graduate artificial intelligence from rung one to rungs two and three in order to realize the decades-old ambition of building thinking machines.

The Causal Revolution bears some of the hallmarks of an emerging Kuhnian paradigm. While Gelman and machine learning practitioners are mainly interested in solving quantitative data-related problems and are content with their tools and standards, Pearl’s exemplar problems are steeped in the enduring search for general purpose intelligence. The crisis that fired the paradigm could be the inability to work with interventions and counterfactuals so the tools and methods might be designed accordingly. Of course, scientific paradigms might at best be convenient sets of historical narratives but one gets the clear sense that Pearl and the statisticians are operating in different overall contexts. If we stick to the framework of paradigms, the Causal Revolution will be realized not by converting existing practitioners but by conscripting an entirely new generation of practitioners eager to share in Pearl’s mission to use qualitative information to unlock fundamentally different means of automated reasoning.

We can also try to situate the Causal Revolution within the philosophy of causation. Much of Pearl’s work is done in opposition to the regularity view of causation, which arose from Hume’s writings about our perception of causation as a product of habit rather than the apprehension of something concrete. Hume’s work also alluded to a counterfactual view of causation and it is broadly this view, developed by later philosophers such as David Lewis, that Pearl adopts and applies in his work. There is also a probability view of causation relevant to Pearl’s work since it often uses probabilistic dependencies but its applicability is somewhat limited due to the stress Pearl now puts on counterfactuals. Other theories of causation seem mostly irrelevant to Pearl’s work.

The view of causation based on counterfactual dependence has been studied extensively over the years and some shortcomings have been found. There is a circularity between counterfactuals and causation and this is problematic for fundamentally grounding either concept. Furthermore, there are counterfactual dependencies that are not causal so we cannot say, for instance, that winter was caused by fall. Necessary conditions add further issues since everything could be said to depend counterfactually on the Big Bang but we would hesitate to say that the sinking of the Titanic was caused by the origin of the universe. Even a pragmatic use of counterfactuals for causation runs into known issues. In the firing squad example earlier, a bullet from either soldier was sufficient to kill the prisoner but both soldiers shot at the same time so we end up with an overdetermination of an effect by multiple causes. Additionally, one cause can preempt another cause such that the effect no longer counterfactually depends on the preempted cause. Due to overdetermination and preemption, causation cannot be reduced to counterfactual dependence.

Pearl mentions both David Hume and David Lewis and shows familiarity with the philosophy of causation. However, he does not explicitly commit to a concrete theory of causation and it is unclear whether he thinks counterfactual dependence is the basis of all causation or whether counterfactual dependence arises from causation. Practitioners are often unaware of the deep philosophical context in which they operate or simply don’t care about it. Does this affect the quality of their work? The concept of causation has been debated in great depth through the history of philosophy. Pearl largely sidesteps this debate in favor of pragmatically working on his research programme. But is he disregarding centuries of philosophical deliberations at any peril? In a more local sense, the philosopher of physics, Tim Maudlin, in his review of The Book of Why claims Pearl could have saved himself years of work had he acquainted himself with ongoing work on causal models by philosophers such as Glymour, Sprites and Scheines in the 1980s onwards. More generally, Pearl makes normative claims about counterfactuals as the basis of causation, intelligence, morality and free will, especially for artificial intelligence, to the extent these claims are central to the Causal Revolution. The interplay of the concepts involved in these claims has been debated deeply over the past two and a half centuries. While it’s difficult to draw any positive conclusions from these debates, we have certainly come to understand a lot about the limits of these concepts. For example, as noted above, causation cannot be fully accounted for using counterfactuals. Pearl and his cohort are mostly able to continue unbothered by these limits, much as how mathematicians continued their work mostly unencumbered even though Kurt Goedel proved decades ago that there are fundamental limits to what we can do with formal systems or how meteorologists still made tremendous strides in weather prediction even after the discovery of chaos theory. Still, grand projects like the arithmetization of thought were fundamentally thwarted and there are likely lessons in this for how far we think the Causal Revolution can go.

Contrariwise, Pearl’s recent work has not yet received a detailed treatment by philosophers. It’s quite rare that applied work in science directly touches a concept as philosophically fundamental as causation. While philosophical deliberations about counterfactual causation have directly shaped the thinking around causal methods, the implications of Pearl’s work could flow easily in the other direction. Structured causal models could provide a practical formulation of causation that can be applied across multiple domains without the need for an ideal formulation of causation. Instead, we can see how far we can go with a common-sensical view of causation grounded in counterfactuals. Pearl’s structural counterfactuals can become primary models to study possible worlds and other flavors of causation based on counterfactual dependence. Causal diagrams such as the one involving the firing squad above work naturally with counterfactuals so they could be used to study problems in counterfactual dependence with greater ease and clarity. It might be interesting to see how much of an impediment limits like overdetermination and preemption really are to the practical applications of causation. Finally, if one day the Causal Revolution indeed leads to thinking machines, we could have a normative model of causation against which we can evaluate our own human treatment of causation.

David Hume’s empiricist tradition scrutinized basic concepts such as causation, perception and religious belief and helped lead science down a path shorn of orthodoxy and blind convention. However, in his book, Pearl shows us that aspects of empiricism have also led to a kind of orthodoxy as a staunch ambivalence towards causation potentially prevents progress across multiple disciplines. With science unshackled from these restraints, we might be on the brink of significant insights with implications for our everyday lives. Most excitingly, as we climb the Ladder of Causation, the break from orthodoxy might lead to long-awaited advances in artificial general intelligence. Taken together, all of this makes Pearl’s book essential reading for anyone interested in the future of scientific practice.

“The Prado, Madrid, 1988” by nathanh100 is licensed under CC BY 2.0

Empiricism and Scientific Change in Judea Pearl’s ‘The Book of Why’

Written by Vishakh