Connecting Cause and Effect

This is an Insight article, written by a selected partner as part of GAR's co-published content. Read more on Insight


In summary

Analytical techniques that are readily used in economics and statistics to measure causal effects can also be deployed in commercial disputes to provide compelling evidence on factual causation – which is, of course, a fundamental issue in the assessment of legal liability and compensatory damages.

As organisations accumulate ever larger and more complex datasets, these analytical techniques are becoming increasingly applicable to a wide range of disputes. This trend has implications for everyone involved in the dispute resolution process: legal practitioners, courts and arbitral tribunals and parties.


Discussion points

  • Isolating and quantifying causal effects using large datasets
  • How techniques from statistics, economics and data science can help
  • An illustrative example: identifying the causal effect on public health outcomes of the proliferation of misinformation on social media
  • Approach to applying these techniques in a legal setting to assess both factual causation and damages
  • Relevance to a wide range of disputes, including claims relating to: the operation of algorithms, accounting misrepresentation, fraud, negligence, defamation claims, and faulty products.

Referenced in this chapter

  • McGregor on Damages
  • Cabinet Office: Behavioural Insights Team
  • The Royal Swedish Academy of Sciences
  • The European Commission
  • New York State Law Reporting Bureau
  • World Health Organization
  • Federal Court of Australia
  • FTI Consulting

Connecting cause and effect

Using techniques from economics, statistics and data science to assess factual causation and legal liability

The challenge of measuring ‘cause and effect’ is common to a wide range of disciplines – from philosophy and theology, to economics and statistics, to medicine and pharmacology. Causation also plays a central role in the application of law: in particular, in the determination of legal liability in disputes. In this chapter, we explain how analytical techniques that are readily used in economics and statistics to measure causal effects, can also be used in commercial disputes to develop compelling evidence as to whether (and to what extent) a defendant’s action has in fact caused a claimant harm. We illustrate these techniques by quantifying the causal effect of the proliferation of anti-vaccination misinformation on social media (analogous to a defendant’s alleged wrong) on vaccination coverage rates in England and Wales (analogous to a claimed harm). We explain how these techniques are applied in practice – both in this illustrative example, and more generally in a commercial dispute. As organisations accumulate ever larger, more expansive and more complex datasets, these analytical techniques are becoming increasingly applicable to a wide range of commercial disputes. This trend has implications for everyone involved in the dispute resolution process: legal practitioners, courts and arbitral tribunals, and parties.

Introduction

In commercial disputes, parties, courts and arbitral tribunals typically seek evidence in relation to the quantification of compensatory damages from experts in finance, accounting and economics. These damages experts take different approaches to quantifying damages, depending on the circumstances of the dispute and the issues on which they are asked to opine. However, in general, their approaches seek to translate a legal theory of the defendant’s alleged wrong into a quantitative analysis of the economic impact of that alleged wrong on the claimant. In performing that calculation, damages experts typically make (or are instructed to make) assumptions around particular issues relevant to quantum. In this chapter, we focus on one of the most fundamental assumptions underlying an assessment of damages: that the defendant’s alleged wrong has in fact caused the claimant harm.

While liability is of course a fundamental legal concept, there are well-established analytical techniques from the fields of economics, statistics (and more recently, data science) that can be used to assist parties, courts and arbitral tribunals to determine whether and to what extent there is a causal relationship between the defendant’s alleged wrong and the claimed harm.[1] In this chapter, we illustrate how these techniques can work in practice, using a non-contentious but topical example: the causal relationship between the proliferation of anti-vaccination misinformation on social media (analogous to an alleged wrong in a commercial dispute), the decrease in population coverage of the measles, mumps and rubella (MMR) vaccine in England and Wales, and the increase in disease incidence (analogous to a claimed harm). This example has immediate parallels to commercial disputes concerning defamation claims, for example, but similar techniques from economics, statistics and data science are applicable to an increasingly wide range of disputes and issues.

Contents of the rest of this chapter

In the rest of this chapter, we

  • explain what economics, statistics and data science have to do with legal liability, focusing on what these disciplines have to say about factual causation;
  • describe how the core techniques from economics and statistics (and in particular, the overlapping field of econometrics) can be used to assess and measure causal relationships; and
  • explain our approach to applying these techniques in the context of commercial disputes, using the illustrative example referred to above.

Causation, economics and statistics

In general, legal liability requires a finding of both ‘cause in fact’ and ‘cause in law’.[2] A defendant’s action is, in fact, a cause of a claimant’s harm if such harm would not have occurred without that action. This is the well-known and universally accepted ‘but-for’ test for factual causation.

Factual causation is a necessary condition for the imposition of legal liability, but it is not sufficient: legal causation (also referred to as proximate causation) must also be established.[3] These legal limits on factual causation are a matter of law, and outside the scope of this chapter.[4] However, factual causation is an issue that economists and statisticians are well positioned to opine upon, having grappled with the issue from as early as the 18th century, when philosopher and economist David Hume first alluded to the but-for test for factual causation:

… we may define a cause to be an object, followed by another, and where all objects, similar to the first, are followed by objects similar to the second. Or in other words, where, if the first object had not been, the second never had existed…[5]

The idea of a hypothetical, but-for scenario (sometimes referred to as the ‘counterfactual’ or ‘potential’ scenario) permeates the empirical work of modern-day economists and statisticians, who tend to use an intellectual framework known as the ‘potential outcomes framework’, often attributed to statistician Jerzy Neyman[6] and economist Donald Rubin.[7] As the potential outcomes framework arose in the context of experiments, the event or action whose causal effect is in question, is called the ‘treatment’.

Within this framework, the causal effect of the treatment (eg, of taking an aspirin, of obtaining a university degree, or of a defendant’s alleged wrong) is defined as the difference between:

  • the actual outcome experienced (eg, the easing of your headache over time, the increase in your income after obtaining the university degree, or the fall in the claimant’s profits after the defendant’s alleged wrong occurred); and
  • the but-for outcome, namely, that which would have been experienced if the treatment was not applied (eg, the easing of the headache if you had not taken the aspirin, the change in your income if you had not obtained the university degree, or the change in the claimant’s profits if the defendant’s alleged wrong had not occurred).

The fundamental problem of causal inference is that the but-for outcome can never be observed: it is not possible to turn back time, opt out of the treatment, and then observe what would have happened in its absence. The solution proposed by Neyman and Rubin is to use a randomised controlled trial – in other words, to randomly allocate a treatment to some individuals but not to a comparable control group, and to then observe the difference in the groups’ average outcomes. This solution is routinely used in medicine and pharmacology to test the efficacy of new medical interventions, and it has also been hugely influential in social science research and policy. For example, the UK government’s Behavioural Insights Team (colloquially referred to as the ‘Nudge Unit’) recommends that randomised controlled trials are used in testing and developing policy across areas of government,[8] and these same experimental techniques have revolutionised the field of economics, with Abhijit Banerjee, Esther Duflo and Michael Kremer being awarded the 2019 Nobel Prize in Economic Sciences for their ‘experimental approach to alleviating global poverty’.[9]

While randomised controlled trials do work, they can be quite expensive to carry out, and are often practically and commercially impossible – especially in the context of a commercial dispute. If an expert suggested to a court or arbitral tribunal that it should identify whether the defendant’s alleged wrong in fact caused the claimant harm by randomly subjecting comparable organisations to the same alleged wrong, he or she would not get very far.

Fortunately, economists and statisticians have developed a series of techniques that are perfectly suited to estimating causal effects without needing to perform randomised controlled trials. These techniques use real-world, observational data (ie, not from randomised controlled trials), and are formally studied in a field that sits at the overlap of economics and statistics: ‘econometrics’. Econometrics is the application of statistical methods to test and quantify economic relationships, and its use is well established in regulatory proceedings,[10] disputes concerning infringements of competition law[11] and securities litigation.[12] These same techniques can provide powerful evidence on factual causation across a much broader range of disputes – and increasingly so as organisations accumulate ever larger, more expansive and more complex datasets.

Measuring cause and effect using data

There is a wide range of techniques that can be used to measure causal relationships, and there is a burgeoning academic literature on the subject. These techniques vary in their levels of sophistication, their data requirements, and the technical statistical assumptions that are required to be satisfied in order for them to provide valid and reliable results.[13] Despite these technical differences, there is one common intellectual thread running through them all, which arises from the potential outcomes framework discussed above: the need to isolate and quantify causal effects by either:

  • estimating the but-for scenario (and therefore implicitly estimating the causal effect, by comparison to the actual scenario); or
  • explicitly estimating the causal effect directly, by first taking into account the various other factors which might complicate the picture.

In practice, this is typically done using multivariate regression analysis, which is depicted in Figure 1, above.

Figure 1: Illustration of multivariate regression analysis

In general, a multivariate regression has three parts:

  • the dependent variable, being the outcome of interest. In the illustrative example in this chapter, the dependent variable is the level of vaccination coverage. In a commercial dispute more generally, the dependent variable might be the claimant’s sales volumes, revenues, costs, or profits;
  • the main explanatory variable, being the factor whose causal effect we are seeking to assess. In the illustrative example in this chapter, the main explanatory variable is the extent of anti-vaccination misinformation in Twitter. In a commercial dispute more generally, the main explanatory variable would measure the extent or timing of the defendant’s alleged wrong; and
  • the confounding factors or control variables, being those factors (other than the main explanatory variable) that might also affect the dependent variable, and whose effects therefore need to be disentangled and removed from the picture, before we can attribute any causal effect. In a commercial dispute more generally, these factors may relate to intervening economic or environmental conditions, for example the effect of the 2007–2009 global financial crisis, or the current economic downturn resulting from the covid-19 pandemic.

By measuring the relationship between the confounding factors and the dependent variable on one hand, separately from the relationship between the main explanatory variable and the dependent variable on the other, the regression model is able to isolate and quantify the causal effect of that main explanatory variable.

It is difficult to overstate the importance of identifying the relevant confounding factors and accounting for them properly – especially when it comes to assessing factual causation in a commercial dispute. In particular:

  • if an expert simply compares the alleged wrong and the claimed harm without accounting for any confounding factors at all, he or she risks committing one of the cardinal sins of statistics: confusing correlation with causation. An example often given in introductory econometrics courses is that of ice cream sales and swimming pool accidents: there is a strong correlation between these two variables, but it is ridiculous to state that ice cream sales cause swimming pool accidents (or vice versa). Although this is an obvious point, we frequently encounter analyses that wholly attribute a claimant’s falling revenues to the effect of the defendant’s alleged wrong, without accounting for any confounding factors whatsoever; and
  • if, on the other hand, an expert takes into account some relevant and material confounding factors but fails to account for others, their analysis may still fail to identify the causal effect accurately, because of another common error known as ‘omitted variable bias’.[14] As the name suggests, omitting these variables would result in a statistically biased (ie, incorrect) assessment of the claimant’s harm, that is attributed to the defendant’s alleged wrong when it is in fact caused wholly or partly by other factors (for which the defendant is not responsible, let alone legally liable).

Courts are aware of the challenge of properly measuring causal relationships, across a broad range of disputes and issues. For example, the authors of this chapter were recently involved in a dispute in the UK insurance industry, in which the English High Court gave permission for the parties to adduce expert evidence in the field of econometric analysis in the initial liability phase of the split trial. The alleged wrong and claimed harms occurred during a period of time that covered the global financial crisis, so the concepts and techniques discussed in this chapter were important to the determination of factual causation, before the question of damages could even arise. The authors were also involved in an arbitration in the US electrical components industry, in which the claimant’s expert submitted an assessment of damages arising from an alleged breach of warranty, based on the assumption that components supplied by the defendant had caused the claimant’s products to fail. The arbitral tribunal rejected the claimant’s expert’s assessment of damages in part because it failed to account for relevant confounding factors (such as the operating environment, or differences in installation and usage patterns), resulting in a statistically biased (and in this case, grossly overstated) assessment of causation and damages.

Although the details of the above matters are not publicly available, there are examples of similar decisions in the public domain. For instance, in a recent dispute concerning alleged securities fraud in the United States, the New York Supreme Court Commercial Division granted a summary judgment dismissing a plaintiff’s claim because it failed to establish loss causation, by failing to prove that the losses were due to alleged misrepresentations instead of the broader 2007–2009 global financial crisis. The court quoted prior decisions, noting that:

. . . when the plaintiff’s loss coincides with a marketwide phenomenon causing comparable losses to other investors, the prospect that the plaintiff’s loss was caused by the fraud decreases, and a plaintiff’s claim fails when it has not...proven...that its loss was caused by the alleged misstatements as opposed to intervening events.[15]

In the next section, we illustrate how the techniques from economics and statistics can be used to measure causation.

An illustration: the causal effect of anti-vaccine misinformation on social media

At the time of writing, social media networks are rife with conspiracy theories and misinformation around a wide range of issues – from the origins of the covid-19 virus and treatments for the disease, to the roots and motivations of the Black Lives Matter movement, to the efficacy of postal ballots in the United States, to the effect of 5G radio waves on the human brain.

However, misinformation is not a new phenomenon – especially when it comes to infectious disease, vaccination and public health.[16] The roots of the anti-vaccination movement can be traced back to the 18th century, when smallpox inoculations were outlawed after being blamed for a severe outbreak of the disease in Paris.[17] Thankfully, vaccination techniques have since improved and the public health case for widespread vaccination has become stronger, better understood and more widely accepted – resulting in global vaccination programmes and the effective eradication of certain infectious diseases. Unfortunately, a long-disproven and discredited link between the MMR vaccine and the onset of symptoms of autism[18] has found a new lease of life through social media, fuelled by content created and shared by the general public, celebrities and politicians, and organisations interested in propagating the myth.

In this context, we use the techniques discussed earlier in this chapter to examine a fundamental question of factual causation: what is the effect on public health of the spread of anti-vaccination misinformation on social media? This question is analogous to those that often arise in commercial and regulatory disputes as part of an assessment of legal lability and damages, relating to, for example:

  • the accuracy and effect of representations made to consumers about the operation of algorithms;[19]
  • the causal effect of an alleged accounting misrepresentation on the share price of a listed company;
  • the causal effect of an alleged insurance fraud on the volume and value of insurance claims;
  • the causal effect of adverse publicity on brand or personal reputation; or
  • the causal effect of a faulty component on the lifetime failure rate of a product.

Our approach to measuring cause and effect

Our approach follows a classic method of scientific inquiry: we formulate a hypothesis, derive predictions from that hypothesis, and then test those predictions using data. In practice, we follow the six steps depicted in Figure 2. We explain these steps in the rest of the section, both by reference to the illustrative example of assessing the causal effect of anti-vaccination misinformation, and also to commercial disputes more generally.

Figure 2: Our approach to measuring casual effects in general

Step 1: Define testable hypothesis

In this example, we have two testable hypotheses: first, that the proliferation of anti-vaccination misinformation causes MMR vaccination rates to fall, and second, that falling MMR vaccination rates cause an increase in instances of measles. Together, these hypotheses identify a causal relationship between misinformation and public health, illustrated in Figure 3.

Figure 3: Testable hypotheses in this example

In the context of a commercial dispute, the hypothesis is often defined quite generally by the claimant’s allegations, and may need to be formulated more precisely for the purpose of statistical analysis.

Step 2: Review relevant literature and documents

Having defined our hypotheses, we consider the existing evidence to understand the strength of support for the hypotheses, understand what the most material and relevant confounding factors might be and identify appropriate sources of data.

In this example, we call upon an extensive public health economics literature on the determinants of higher or lower rates of vaccination coverage, and the epidemiology of measles. This literature highlights certain socio-economic factors that determine acceptance of or ‘demand’ for vaccines (such as levels of income, employment, education, and ethnicity), and other policy and logistical factors that affect access to or ‘supply’ of vaccines.[20] It also highlights that populations with lower rates of vaccination coverage are likely to experience more frequent and more widespread outbreaks of infectious disease.[21]

In the context of a commercial dispute, there may be less of an emphasis on academic literature (unless it is directly relevant), and more of a focus on precedent from previous judgments, existing pleadings, witness statements and expert reports in the current matter, and other relevant documents in disclosure or in the public domain.

Step 3: Collect data

Having reviewed the literature, we collect the raw data needed for our analysis from both public and private sources, in relation to:

  • the dependent variable, which in this example is a measure of MMR vaccination coverage provided by Public Health England and Public Health Wales;
  • the main explanatory variable, which in this example is based on hundreds of thousands of Tweets that relate to the MMR vaccine; and
  • confounding factors, which in this example are derived from official data published by the UK Office of National Statistics.

In the context of a commercial dispute, this step may focus less on publicly available data, and more on private data held by the parties. To the extent that such data has not already been disclosed, we commonly assist in preparing formal disclosure and information requests, and directly interrogating the parties’ databases to extract the relevant data.

Step 4: Prepare dataset

Having collected the relevant raw data, the next step is to process and prepare it for analysis. This involves fully understanding what the data means, ‘cleaning’ the data to identify and fix errors and inconsistencies, merging and preparing a high-quality dataset suitable for analysis, and recording each step.

In this example, we first process the tweets to develop a measure of misinformation, using a data science technique called ‘supervised machine learning’. We explain this technique in detail in an upcoming chapter, but in brief, it entails: (i) performing a human review of a sample of tweets to identify instances of misinformation; (ii) training an algorithm to identify such tweets in the broader population; (iii) using this algorithm to classify them all; and (iv) using the classification to construct a statistical index that tracks both the amount of misinformation and its exposure over time – which is shown in Figure 4, overleaf. We find that there was a general increase in misinformation over time, but with ‘surges’ around particular events – such as President Trump linking vaccination and autism during a Presidential candidate debate in late 2015, and Robert De Niro appearing on television to debate the vaccine-autism link in early 2016.

Figure 4: Misinformation (index, value, with 1 = 2017 Q3)[31]

Next, we process and match all the other data into a single dataset that covers approximately 160 separate geographical areas across England and Wales, with quarterly observations, over a period of seven years, from 2012 to 2018. The data reveal complex relationships between vaccination coverage and the various factors, that need to be considered properly in order to assess the causal affect.[22]

In the context of a commercial dispute, we would seek to follow a similar process. This dataset preparation step can be time consuming and thankless, but it is essential: the completeness and integrity of the dataset will influence the quality and reliability of any final assessment of factual causation.

Step 5: Perform economic and statistical analysis

Having prepared a dataset, we apply the techniques discussed earlier in this chapter.

In this example, we develop a multivariate regression model that accounts for the relevant confounding factors, to isolate and quantify the causal effect of anti-vaccination misinformation. We illustrate this in Figure 5, opposite. We find that the proliferation of anti-vaccination misinformation on social media has a ‘statistically significant’[23] relationship with vaccination coverage, in other words, with parents’ decisions to vaccinate their children, and that this relationship is not explained by the various confounding factors. In other words, this is not just a correlation: it is a causal relationship.

Figure 5: Illustration of multivariate regression in this example

We can quantify the extent of this causal relationship. For example, our analysis suggests that over the five-year period from 2014–2018, misinformation increased by approximately 800 per cent,[24] vaccination coverage fell by approximately 3 percentage points,[25] and that over half of this fall was due to misinformation.[26]

The link between this fall in vaccination coverage and outbreaks of measles is a matter of medicine and epidemiology. Our data suggests that on average, a 1 per cent decrease in vaccination coverage is associated with a 2 per cent increase in the measles incidence rate[27] (a finding consistent with estimates in the epidemiology literature considered in Step 2, above).[28]

In the context of a commercial dispute, we would present and explain the results of the regression model in full in a technical appendix,[29] and disclose all of the underlying data and calculations so that the results can be replicated, fully understood and tested by an opposing expert.

Step 6: Communicate conclusions

Having identified and quantified an overall causal relationship, the next step is to explain it in non-technical terms, and to draw out its implications.

In this example, we can explain the causal relationship in qualitative terms and in aggregate quantitative terms (as in Step 5 above). However, it can sometimes help to make the results more relatable and compelling by applying them to particular events. In this example, we could use the results of our statistical analysis to quantify the causal effect of particular surges of misinformation, following particular events. For instance, we show in Figure 4 that there was an almost 200 per cent surge in anti-vaccination misinformation on Twitter in Q2 2016 (ie, a tripling), following Robert De Niro’s appearance on television to debate the vaccine-autism link. Our analysis suggests that this surge caused a drop in MMR vaccine coverage of 0.40 percentage points in that quarter alone (which in practice means that approximately 700 children across England and Wales were not vaccinated). Such findings raise several important questions for social media users, social media companies, and policy makers.[30]

In the context of a commercial dispute, we would typically communicate our results in non-technical terms, in a formal expert report. The results would inform our expert opinion on whether and to what extent the defendant’s alleged wrong is in fact a cause of the claimed harm – and these opinions would have implications for the court or arbitral tribunal’s determination of factual causation, and any subsequent assessment of damages. One of the most powerful and potentially helpful features of this approach to assessing causation, is that it provides not only an explicit estimate of the existence and size of the causal relationship, but also the precision and level of statistical confidence around that estimate. More precise and statistically confident estimates make for more compelling evidence.

Conclusions and implications for those involved in dispute resolution

Legal liability is (of course) a legal concept, but it is still requires establishing factual causation. While it is always possible that the ‘smoking gun’ will be found buried away in factual evidence (in an incriminating email, for example), it is increasingly the case that a more data-based approach is possible, and in some cases, required.

To find the statistical ‘smoking gun’ in data requires end-to-end expertise and experience in extracting the data from wherever it resides, validating, reviewing and readying it for inspection, conducting powerful analysis using the appropriate techniques drawn from the fields of economics, statistics and data science, and communicating the conclusions in a comprehensible and compelling way. These techniques are already well-established in social science, public policy, and certain types of disputes. When properly applied, they can also lead to persuasive, defensible and cost-effective expert evidence in commercial disputes, in relation to both liability and damages.

As organisations accumulate ever larger, more expansive and more complex datasets, these techniques are becoming increasingly applicable to an even wider range of commercial disputes. We believe that everyone involved in the dispute resolution process should understand (at least at a high level) how these techniques work, and what implications they may have. In particular:

  • legal practitioners instructed by claimants should consult experts in economics, statistics and data science to consider whether and to what extent a data-based approach might provide compelling evidence of factual causation (and ultimately, a more robust assessment of damages). Where instructed by defendants, legal practitioners should consider whether the claimants’ case relies on an implicit or explicit assumption of factual causation, and if so whether this assumption can be tested (and potentially, shown to be incorrect, or otherwise not to meet the required standard of proof) using the methods described in this chapter;
  • courts and arbitral tribunals should expect parties to seek to adduce expert evidence built around more sophisticated data-based techniques, not only in relation to damages, but increasingly in relation to factual causation and liability. Where such evidence is not proactively offered by the parties, courts and arbitral tribunals may consider asking for it directly (as the English High Court has done recently, in the matter discussed earlier in this chapter);
  • parties to disputes should identify as early as possible in the dispute process what relevant data is available within their organisations, and discuss with legal counsel and expert advisors whether to commission an initial statistical analysis of this data to provide an early, objective assessment of factual causation and potential damages. By seeking such expert advice early in the dispute process, parties can better understand the merits of their case, and make more informed and cost-effective business decisions about whether to enter into a formal dispute process; and
  • expert witnesses in the assessment of damages should consider whether the assumption of factual causation that underlies their assessment of damages is supported by the underlying data, and if not, how their assessment of damages should be adjusted to take this into account.

‘Big data’ and its relevance to this chapter

‘Big data’ is the now familiar term used to describe the largest and most complicated of these datasets, namely, those with:

  • the greatest volumes of data (such as that which might be obtained from Twitter data feeds, or website click data);
  • generated and collected at the highest velocity (such as continuous location tracking data from GPS-enabled smartphones); and
  • in a variety of formats (data is not only numerical and textual, but also in audio, visual and video formats).32

These datasets are often too large and complex for simple, traditional tools and techniques to analyse effectively and efficiently – but advancements in computing power means that new tools and techniques from the field of data science (and in particular, artificial intelligence (AI) and machine learning (ML) methods), can be used to help analyse the data, somewhat automatically.

AI and ML techniques excel at identifying patterns in data and producing predictions. This makes the techniques incredibly useful (and in some cases, revolutionary) in a wide range of industries and contexts – including:

  • retail, where online retailers are able to produce eerily accurate personalised recommendations for shoppers, based on ML analysis of shoppers’ previous purchases and activity;
  • insurance, where insurance companies are able to predict more accurately the probability of claims being made, and therefore price insurance products more appropriately to the risk profile of applicants; and
  • dispute resolution, where large volumes of documents can be reviewed much more quickly and efficiently to identify ‘the smoking gun’,33 or data on previous court rulings can be used to predict case outcomes.

However, none of these applications are directly relevant to assessing factual causation. In general, ML algorithms are not concerned with identifying whether and to what extent one factor (such as an alleged wrong) has caused another (such as harm to a claimant). In order to isolate and measure causal effects using data, the well-established techniques that are the focus of this chapter are still required. That said, the latest academic research at the forefront of economics, statistics and data science is exploring the ways in which AI and ML techniques can complement each other. We illustrate one of the complementarities in the example given in this chapter.


Notes

[1] Some of these techniques are already used extensively in disputes concerning competition law infringements and securities fraud. However, as we discuss later in this chapter, these techniques are increasingly being used in other types of disputes and we expect this trend to continue.

[2] McGregor on Damages 20th edition, 2019, Sweet & Maxwell Ltd, Chapter 8 Part I (A), section 1.

[3] Legal causation is a matter of law, and outside the scope of this chapter. However, we understand that legal causation examines whether the defendant’s action is sufficiently close to the claimant’s harm in order for the claimant to be held legally liable. These legal limits can differ between jurisdictions and areas of the law, but generally concern issues such as remoteness or foreseeability (ie, whether it was foreseeable to a defendant at the time of her action that, it would cause the harm that it in fact caused) and intervening acts (ie, whether there was another event that breaks the chain of causation between the defendant’s action and claimant’s harm). There is an extensive discussion of legal causation in McGregor on Damages 20th edition, 2019, Sweet & Maxwell Ltd, Chapter 8 Part I (A), section 1.

[4] Statistical techniques can be used to assist in accounting for legal or contractual limitations (such as limited period warranties, in breach of warranty disputes) in the assessment of damages – although we focus on factual causation in this chapter.

[5] An Enquiry Concerning Human Understanding, 2007, Oxford University Press, p. 56.

[6] Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes. Master’s Thesis, 1923, Jerzy Neyman. Excerpts reprinted in English, Statistical Science, 1990, Volume 5, Institute of Mathematical Statistics, pp. 463–472.

[7] Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies, Journal of Educational Psychology, 1974, Donald B Rubin, 66 (5): 688–701, p. 689.

[8] Developing Public Policy with Randomised Controlled Trials, 2012, Cabinet Office: Behavioural Insights Team.

[9] The Prize in Economic Sciences, 14 October 2019, The Royal Swedish Academy of Sciences.

[10] For example, UK regulators routinely assess whether regulated companies’ operating costs are efficient, by using econometric models to benchmark costs against those of other regulated companies.

[11] For example, the European Commission, Practical Guide, 2013, routinely uses econometric models to quantify harm in actions for damages based on breaches of article 101 or 102.

[12] For example, event studies are often used to assess the impact of alleged accounting frauds on shareholders.

[13] Explaining all of these techniques is outside the scope of this article. However, interested readers might consult introductory econometrics textbooks. The authors recommend two in particular: (1) A Guide to Econometrics, 2008, Peter Kennedy, Wiley-Blackwell, which provides an intuitive overview of the subject as a whole, without excessive technical detail, and (2) Mostly Harmless Econometrics, 2009, Joshua D. Angrist and Jorn-Steven Pischke, Princeton University Press, which focuses on the use of econometrics to assess causal effects, especially in the context of natural experiments and policy changes.

[14] To continue the example above, the correlation between ice cream sales and swimming pool accidents is likely driven by a confounding factor: the weather. Both ice cream sales and swimming pool accidents are driven higher in the summer months by warmer weather which increases the demand for ice cream, and increases the use of outdoor swimming pools. If this confounding factor was included in a multivariate regression, it would account for the previously observed relationship.

[15] Basis PAC-Rim Opportunity Fund (Master) v TCW Asset Mgt. Co., 2 March 2017, New York State Law Reporting Bureau.

[16] The Real-World Effects of ‘Fake News’, 11 June 2020, FTI Consulting.

[17] The anti-vaccination movement, Measles & Rubella Initiative; History of Anti-vaccination Movements, 10 January 2018, The College of Physicians of Philadelphia.

[18] Measles cases spike globally due to gaps in vaccination coverage, 29 November 2018, World Health Organization.

[19] Australian Competition and Consumer Commission v Trivago NV [2020] FCA 16.

[20] Mapping information exposure on social media to explain differences in HPV vaccination coverage in the United States, May 2017, Dunn, AG, Surian, D, Leask, J, Dey, A, Mandl, KD and Coiera, E, Vaccine, Volume 35, Issue 23, pages 3033, 3040.

[21] For example, International Measles Incidence and Immunization Coverage, July 2011, Hall, D. and Jolley, D, The Journal of Infectious Diseases, Column 204, Supplemental Issue 1, page S161.

[22] For example, the data shows that vaccination coverage varies significantly across England and Wales, that areas with greater proportions of the population from minority ethnic groups tend to have lower vaccination coverage rates, and that areas with more highly educated populations tend to have lower vaccination coverage rates (perhaps somewhat counterintuitively). We illustrate these and other findings in a separate article, The Real-World Effects of ‘Fake News’, 11 June 2020, FTI Consulting, pages 3-4, Figures 4 and 5.

[23] It is common practice in regression analysis to perform a formal statistical test for whether the effect estimated by the regression model is: (1) a genuine effect, or (2) whether it has instead been measured by chance (ie, there is in fact no effect at all).

[24] FTI Consulting machine learning analysis of Twitter data.

[25] The proportion of children receiving their first dose by age two fell from 93.4 per cent in Q1 2014 to 90.4 per cent in Q4 2018.

[26] Our model suggests that for every 100 per cent increase in misinformation, we see a 0.205 percentage point drop in vaccination coverage on average. Under the assumption that this effect remains constant, an 800 per cent increase in misinformation therefore results in a (800/100) * 0.205 = 1.64 percentage points reduction in vaccination coverage.

[27] Measles incidence refers to the number of reported measles cases divided by the total population in any given time period. Our data suggests that for every one percent reduction in the two-year MMR1 vaccination coverage percentage, there is on average a two per cent increase in the measles incidence rate.

[28] For example, International Measles Incidence and Immunization Coverage, July 2011, Hall, D. and Jolley, D., The Journal of Infectious Diseases, Column 204, Supplemental Issue 1, page S161.

[29] This technical appendix would also include the results of various technical statistical tests for the validity of the model.

[30] The Real-World Effects of ‘Fake News’, 11 June 2020, FTI Consulting, page 5.

[31] We present this misinformation measure as a simplified index, in which the level of misinformation in the quarter with the highest level (ie, 2007 Q3) is set to 1.

[32] See for example the definition provided in Gartner’s Glossary.

[33] See for example FTI Consulting’s expert e-discovery services.

Unlock unlimited access to all Global Arbitration Review content