Deborah Mayo (Virginia Tech): Publications

More details

Virginia Tech
Department of Philosophy

Professor Emeritus

Blacksburg, Virginia, United States of America

45

NewPerspectiveson (SomeOld) Problems of Frequentist Statistics
with David Cox

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 247. 2009.

Bayesian Reasoning, Misc
110

Frequentist statistics as a theory of inductive inference
with David Cox

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. 2009.

After some general remarks about the interrelation between philosophical and statistical thinking, the discussion centres largely on significance tests. These are defined as the calculation of p-values rather than as formal procedures for ‘acceptance‘ and ‘rejection‘. A number of types of null hypothesis are described and a principle for evidential interpretation set out governing the implications of p- values in the specific circumstances of each application, as contrasted with a long-run inter…Read more
After some general remarks about the interrelation between philosophical and statistical thinking, the discussion centres largely on significance tests. These are defined as the calculation of p-values rather than as formal procedures for ‘acceptance‘ and ‘rejection‘. A number of types of null hypothesis are described and a principle for evidential interpretation set out governing the implications of p- values in the specific circumstances of each application, as contrasted with a long-run interpretation. A number of more complicated situ- ations are discussed in which modification of the simple p-value may be essential.

Bayesian Reasoning Evolutionary Biology Frequentism Philosophy of Statistics Inductive Reasoning Hypothet…Read more
Bayesian Reasoning Evolutionary Biology Frequentism Philosophy of Statistics Inductive Reasoning Hypothetico-Deductive Method Varieties of Confirmation
77

Objectivity and conditionality in frequentist inference
with David Cox

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 276. 2009.

Bayesian Reasoning
181

Ontology & Methodology
with Benjamin C. Jantzen and Lydia Patton

Synthese 192 (11): 3413-3423. 2015.

Philosophers of science have long been concerned with the question of what a given scientific theory tells us about the contents of the world, but relatively little attention has been paid to how we set out to build theories and to the relevance of pre-theoretical methodology on a theory’s interpretation. In the traditional view, the form and content of a mature theory can be separated from any tentative ontological assumptions that went into its development. For this reason, the target of inter…Read more
Philosophers of science have long been concerned with the question of what a given scientific theory tells us about the contents of the world, but relatively little attention has been paid to how we set out to build theories and to the relevance of pre-theoretical methodology on a theory’s interpretation. In the traditional view, the form and content of a mature theory can be separated from any tentative ontological assumptions that went into its development. For this reason, the target of interpretation is taken to be the mature theory and nothing more. On this view, positions on ontology emerge only once a theory is to hand, not as part of the process of theory building. Serious attention to theory creation suggests this is too simple. In particular, data collection and experimentation are influenced both by theory and by assumptions about the entities thought to be the target of study. Initial reasoning about possible ontologies has an influence on the choice of theoretical variables as well as on the judgments of the methodology appropriate to investigate them.

Convergent Realism
71

Severe Testing: Error Statistics versus Bayes Factor Tests
British Journal for the Philosophy of Science. forthcoming.

Science, Logic, and Mathematics
193

Duhem's problem, the bayesian way, and error statistics, or "what's belief got to do with it?"
Philosophy of Science 64 (2): 222-244. 1997.

I argue that the Bayesian Way of reconstructing Duhem's problem fails to advance a solution to the problem of which of a group of hypotheses ought to be rejected or "blamed" when experiment disagrees with prediction. But scientists do regularly tackle and often enough solve Duhemian problems. When they do, they employ a logic and methodology which may be called error statistics. I discuss the key properties of this approach which enable it to split off the task of testing auxiliary hypotheses fr…Read more
I argue that the Bayesian Way of reconstructing Duhem's problem fails to advance a solution to the problem of which of a group of hypotheses ought to be rejected or "blamed" when experiment disagrees with prediction. But scientists do regularly tackle and often enough solve Duhemian problems. When they do, they employ a logic and methodology which may be called error statistics. I discuss the key properties of this approach which enable it to split off the task of testing auxiliary hypotheses from that of appraising a primary hypothesis. By discriminating patterns of error, this approach can at least block, if not also severely test, attempted explanations of an anomaly. I illustrate how this approach directs progress with Duhemian problems and explains how scientists actually grapple with them

Bayesian Reasoning, Misc
'Peirce-pectives' on Metaphysics and the Sciences
with Susan Haack, Rosa Mayorga, Jaime Nubiola, Cornelis de Waal, Robert G. Meyers, Joseph C. Pitt, and Nicholas Rescher

Transactions of the Charles S. Peirce Society 41 (2): 237-365. 2005.

Charles Sanders Peirce
33

Science, Error Statistics, and Arguing from Error
Poznan Studies in the Philosophy of the Sciences and the Humanities 71 95-111. 2000.

Science, Logic, and Mathematics Bayesian Reasoning
21

Toward a More Objective Understanding of the Evidence of Carcinogenic Risk
PSA Proceedings of the Biennial Meeting of the Philosophy of Science Association 1988 (2): 489-503. 1988.

The field of quantified risk assessment is a new field, only about 20 years old, and already it is considered to be in a crisis. As Funtowicz and J.R. Ravetz (1985) put it:The concept of risk in terms of probability has proved to be so elusive, and statistical inference so problematic, that many experts in the field have recently either lost hope of finding a scientific solution or lost faith in Risk Analysis as a tool for decisionmaking. (p.219)Thus the ‘art’ of the assessment of risks… is at a…Read more
The field of quantified risk assessment is a new field, only about 20 years old, and already it is considered to be in a crisis. As Funtowicz and J.R. Ravetz (1985) put it:The concept of risk in terms of probability has proved to be so elusive, and statistical inference so problematic, that many experts in the field have recently either lost hope of finding a scientific solution or lost faith in Risk Analysis as a tool for decisionmaking. (p.219)Thus the ‘art’ of the assessment of risks… is at an impasse. The early hopes that it could be reduced to a science are frustrated. …[O]thers are tending to introduce the ‘human’ and ‘cultural’ factors. The question now becomes, to what extent should these predominate? Would it be to the reduction or exclusion of the ‘scientific’ aspects? For, …if the perceived phenomena of ‘risks’ are interpreted as lacking all objective content or being merely a small part of some total cultural configuration, then there is no basis for dialogue between opposed positions on such problems, (pp.220-221)
44

Error and the Growth of Experimental Knowledge
University of Chicago. 1996.

This text provides a critique of the subjective Bayesian view of statistical inference, and proposes the author's own error-statistical approach as an alternative framework for the epistemology of experiment. It seeks to address the needs of researchers who work with statistical analysis.
22

Cartwright, Causality, and Coincidence
PSA Proceedings of the Biennial Meeting of the Philosophy of Science Association 1986 (1): 42-58. 1986.

In How the Laws of Physics Lie (1983)2 Cartwright argues for being a realist about theoretical entities but non-realist about theoretical laws. Her reason for this distinction is that only the former involves causal explanation, and accepting causal explanations commits us to the existence of the causal entity invoked. “What is special about explanation by theoretical entity is that it is causal explanation, and existence is an internal characteristic of causal claims. There is nothing similar f…Read more
In How the Laws of Physics Lie (1983)2 Cartwright argues for being a realist about theoretical entities but non-realist about theoretical laws. Her reason for this distinction is that only the former involves causal explanation, and accepting causal explanations commits us to the existence of the causal entity invoked. “What is special about explanation by theoretical entity is that it is causal explanation, and existence is an internal characteristic of causal claims. There is nothing similar for theoretical laws.” (p. 93). For, according to Cartwright, the acceptability of a theoretical explanation is a matter of its ability to satisfy such criteria as prganizing and simplifying, and in her view, “success at organizing, predicting, and classifying is never an argument for truth.” (p. 91). In contrast, Cartwright claims, “When I infer from an effect to a cause, I am asking what made the effect occur, what brought it about.
62

Error, tests and theory confirmation
with Aris Spanos

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 125-154. 2009.

Confirmation
94

Some surprising facts about surprising facts
Studies in History and Philosophy of Science Part A 45 79-86. 2014.

A common intuition about evidence is that if data x have been used to construct a hypothesis H, then x should not be used again in support of H. It is no surprise that x fits H, if H was deliberately constructed to accord with x. The question of when and why we should avoid such “double-counting” continues to be debated in philosophy and statistics. It arises as a prohibition against data mining, hunting for significance, tuning on the signal, and ad hoc hypotheses, and as a preference for prede…Read more
A common intuition about evidence is that if data x have been used to construct a hypothesis H, then x should not be used again in support of H. It is no surprise that x fits H, if H was deliberately constructed to accord with x. The question of when and why we should avoid such “double-counting” continues to be debated in philosophy and statistics. It arises as a prohibition against data mining, hunting for significance, tuning on the signal, and ad hoc hypotheses, and as a preference for predesignated hypotheses and “surprising” predictions. I have argued that it is the severity or probativeness of the test—or lack of it—that should determine whether a double-use of data is admissible. I examine a number of surprising ambiguities and unexpected facts that continue to bedevil this debate

Confirmation
123

Statistical significance and its critics: practicing damaging science, or damaging scientific practice?
with David Hand

Synthese 200 (3): 1-33. 2022.

While the common procedure of statistical significance testing and its accompanying concept of p-values have long been surrounded by controversy, renewed concern has been triggered by the replication crisis in science. Many blame statistical significance tests themselves, and some regard them as sufficiently damaging to scientific practice as to warrant being abandoned. We take a contrary position, arguing that the central criticisms arise from misunderstanding and misusing the statistical tools…Read more
While the common procedure of statistical significance testing and its accompanying concept of p-values have long been surrounded by controversy, renewed concern has been triggered by the replication crisis in science. Many blame statistical significance tests themselves, and some regard them as sufficiently damaging to scientific practice as to warrant being abandoned. We take a contrary position, arguing that the central criticisms arise from misunderstanding and misusing the statistical tools, and that in fact the purported remedies themselves risk damaging science. We argue that banning the use of p-value thresholds in interpreting data does not diminish but rather exacerbates data-dredging and biasing selection effects. If an account cannot specify outcomes that will not be allowed to count as evidence for a claim—if all thresholds are abandoned—then there is no test of that claim. The contributions of this paper are: To explain the rival statistical philosophies underlying the ongoing controversy; To elucidate and reinterpret statistical significance tests, and explain how this reinterpretation ameliorates common misuses and misinterpretations; To argue why recent recommendations to replace, abandon, or retire statistical significance undermine a central function of statistics in science: to test whether observed patterns in the data are genuine or due to background variability.

Experimentation in Science Measurement in Science Scientific Instruments
278

How to discount double-counting when it counts: Some clarifications
British Journal for the Philosophy of Science 59 (4): 857-879. 2008.

The issues of double-counting, use-constructing, and selection effects have long been the subject of debate in the philosophical as well as statistical literature. I have argued that it is the severity, stringency, or probativeness of the test—or lack of it—that should determine if a double-use of data is admissible. Hitchcock and Sober ([2004]) question whether this severity criterion' can perform its intended job. I argue that their criticisms stem from a flawed interpretation of the severity…Read more
The issues of double-counting, use-constructing, and selection effects have long been the subject of debate in the philosophical as well as statistical literature. I have argued that it is the severity, stringency, or probativeness of the test—or lack of it—that should determine if a double-use of data is admissible. Hitchcock and Sober ([2004]) question whether this severity criterion' can perform its intended job. I argue that their criticisms stem from a flawed interpretation of the severity criterion. Taking their criticism as a springboard, I elucidate some of the central examples that have long been controversial, and clarify how the severity criterion is properly applied to them. Severity and Use-Constructing: Four Points (and Some Clarificatory Notes) 1.1 Point 1: Getting beyond all or nothing standpoints 1.2 Point 2: The rationale for prohibiting double-counting is the requirement that tests be severe 1.3 Point 3: Evaluate severity of a test T by its associated construction rule R 1.4 Point 4: The ease of passing vs. ease of erroneous passing: Statistical vs. Definitional probability The False Dilemma: Hitchcock and Sober 2.1 Marsha measures her desk reliably 2.2 A false dilemma Canonical Errors of Inference 3.1 How construction rules may alter the error-probing performance of tests 3.2 Rules for accounting for anomalies 3.3 Hunting for statistically significant differences Concluding Remarks

Falsification Philosophy of Statistics Decision Theory and Hypothesis Testing Evidence, Misc Confirmatio…Read more
Falsification Philosophy of Statistics Decision Theory and Hypothesis Testing Evidence, Misc Confirmation, Misc
88

Introduction to recent issues in philosophy of statistics: evidence, testing, and applications
with Molly Kao and Elay Shech

Synthese 201 (4): 1-5. 2023.
55

Increasing Public Participation in Controversies Involving Hazards: The Value of Metastatistical Rules
Science, Technology and Human Values 10 (4): 55-65. 1985.
69

Causal Modeling, Explanation and Severe Testing
with Clark Glymour and Aris Spanos

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 331-375. 2009.

Causal Modeling Explanatory Value Theories of Explanation, Misc Philosophy of Statistics Statistical Exp…Read more
Causal Modeling Explanatory Value Theories of Explanation, Misc Philosophy of Statistics Statistical Explanation Explanation in Neuroscience
108

Significance Tests: Vitiated or Vindicated by the Replication Crisis in Psychology?
Review of Philosophy and Psychology 12 (1): 101-120. 2020.

The crisis of replication has led many to blame statistical significance tests for making it too easy to find impressive looking effects that do not replicate. However, the very fact it becomes difficult to replicate effects when features of the tests are tied down actually serves to vindicate statistical significance tests. While statistical significance tests, used correctly, serve to bound the probabilities of erroneous interpretations of data, this error control is nullified by data-dredging…Read more
The crisis of replication has led many to blame statistical significance tests for making it too easy to find impressive looking effects that do not replicate. However, the very fact it becomes difficult to replicate effects when features of the tests are tied down actually serves to vindicate statistical significance tests. While statistical significance tests, used correctly, serve to bound the probabilities of erroneous interpretations of data, this error control is nullified by data-dredging, multiple testing, and other biasing selection effects. Arguments claiming to vitiate statistical significance tests attack straw person variants of tests that commit well-known fallacies and misinterpretations. There is a tension between popular calls for preregistration – arguably, one of the most promising ways to boost replication – and accounts that downplay error probabilities: Bayes Factors, Bayesian posteriors, likelihood ratios. By underscoring the importance of error control for well testedness, the replication crisis points to reformulating tests so as to avoid fallacies and report the extent of discrepancies that are and are not indicated with severity.

Philosophy of Psychology
297

Methodology in Practice: Statistical Misspecification Testing
with Aris Spanos

Philosophy of Science 71 (5): 1007-1025. 2004.

The growing availability of computer power and statistical software has greatly increased the ease with which practitioners apply statistical methods, but this has not been accompanied by attention to checking the assumptions on which these methods are based. At the same time, disagreements about inferences based on statistical research frequently revolve around whether the assumptions are actually met in the studies available, e.g., in psychology, ecology, biology, risk assessment. Philosophica…Read more
The growing availability of computer power and statistical software has greatly increased the ease with which practitioners apply statistical methods, but this has not been accompanied by attention to checking the assumptions on which these methods are based. At the same time, disagreements about inferences based on statistical research frequently revolve around whether the assumptions are actually met in the studies available, e.g., in psychology, ecology, biology, risk assessment. Philosophical scrutiny can help disentangle 'practical' problems of model validation, and conversely, a methodology of statistical model validation can shed light on a number of issues of interest to philosophers of science

Scientific Practice Philosophy of Statistics Modeling Practices Varieties of Confirmation Probabilistic …Read more
Scientific Practice Philosophy of Statistics Modeling Practices Varieties of Confirmation Probabilistic Reasoning
49

Acceptable Evidence (edited book)
with Rachelle D. Hollander

Oxford University Press USA. 1994.

Discussions of science and values in risk management have largely focused on how values enter into arguments about risks, that is, issues of acceptable risk. Instead this volume concentrates on how values enter into collecting, interpreting, communicating, and evaluating the evidence of risks, that is, issues of the acceptability of evidence of risk. By focusing on acceptable evidence, this volume avoids two barriers to progress. One barrier assumes that evidence of risk is largely a matter of o…Read more
Discussions of science and values in risk management have largely focused on how values enter into arguments about risks, that is, issues of acceptable risk. Instead this volume concentrates on how values enter into collecting, interpreting, communicating, and evaluating the evidence of risks, that is, issues of the acceptability of evidence of risk. By focusing on acceptable evidence, this volume avoids two barriers to progress. One barrier assumes that evidence of risk is largely a matter of objective scientific data and therefore uncontroversial. The other assumes that evidence of risk, being "just" a matter of values, is not amenable to reasoned critique. Denying both extremes, this volume argues for a more constructive conclusion: understanding the interrelations of scientific and value issues enables a critical scrutiny of risk assessments and better public deliberation about social choices. The contributors, distinguished philosophers, policy analysts, and natural and social scientists, analyze environmental and medical controversies, and assumptions underlying views about risk assessment and the scientific and statistical models used in risk management.

Science and Values
57

About Thinking (review)
Teaching Philosophy 5 (1): 80-83. 1982.

Philosophy of Education
172

Some methodological issues in experimental economics
Philosophy of Science 75 (5): 633-645. 2008.

The growing acceptance and success of experimental economics has increased the interest of researchers in tackling philosophical and methodological challenges to which their work increasingly gives rise. I sketch some general issues that call for the combined expertise of experimental economists and philosophers of science, of experiment, and of inductive‐statistical inference and modeling. †To contact the author, please write to: 235 Major Williams, Virginia Tech, Blacksburg, VA 24061‐0126; e‐m…Read more
The growing acceptance and success of experimental economics has increased the interest of researchers in tackling philosophical and methodological challenges to which their work increasingly gives rise. I sketch some general issues that call for the combined expertise of experimental economists and philosophers of science, of experiment, and of inductive‐statistical inference and modeling. †To contact the author, please write to: 235 Major Williams, Virginia Tech, Blacksburg, VA 24061‐0126; e‐mail: [email protected].

Experimental Economics
139

Response to Howson and Laudan
Philosophy of Science 64 (2): 323-333. 1997.

A toast is due to one who slays Misguided followers of Bayes, And in their heart strikes fear and terror With probabilities of error! (E.L. Lehmann)

Bayesian Reasoning, Misc
280

Novel evidence and severe tests
Philosophy of Science 58 (4): 523-552. 1991.

While many philosophers of science have accorded special evidential significance to tests whose results are "novel facts", there continues to be disagreement over both the definition of novelty and why it should matter. The view of novelty favored by Giere, Lakatos, Worrall and many others is that of use-novelty: An accordance between evidence e and hypothesis h provides a genuine test of h only if e is not used in h's construction. I argue that what lies behind the intuition that novelty matter…Read more
While many philosophers of science have accorded special evidential significance to tests whose results are "novel facts", there continues to be disagreement over both the definition of novelty and why it should matter. The view of novelty favored by Giere, Lakatos, Worrall and many others is that of use-novelty: An accordance between evidence e and hypothesis h provides a genuine test of h only if e is not used in h's construction. I argue that what lies behind the intuition that novelty matters is the deeper intuition that severe tests matter. I set out a criterion of severity akin to the notion of a test's power in Neyman-Pearson statistics. I argue that tests which are use-novel may fail to be severe, and tests that are severe may fail to be use-novel. I discuss the 1919 eclipse data as a severe test of Einstein's law of gravity

Evidence, Misc Imre Lakatos
487

Experimental practice and an error statistical account of evidence
Philosophy of Science 67 (3): 207. 2000.

In seeking general accounts of evidence, confirmation, or inference, philosophers have looked to logical relationships between evidence and hypotheses. Such logics of evidential relationship, whether hypothetico-deductive, Bayesian, or instantiationist fail to capture or be relevant to scientific practice. They require information that scientists do not generally have (e.g., an exhaustive set of hypotheses), while lacking slots within which to include considerations to which scientists regularly…Read more
In seeking general accounts of evidence, confirmation, or inference, philosophers have looked to logical relationships between evidence and hypotheses. Such logics of evidential relationship, whether hypothetico-deductive, Bayesian, or instantiationist fail to capture or be relevant to scientific practice. They require information that scientists do not generally have (e.g., an exhaustive set of hypotheses), while lacking slots within which to include considerations to which scientists regularly appeal (e.g., error probabilities). Building on my co-symposiasts contributions, I suggest some directions in which a new and more adequate philosophy of evidence can move

Experimentation in Science Philosophy of Statistics Falsification Evidence, Misc Confirmation, Misc Induc…Read more
Experimentation in Science Philosophy of Statistics Falsification Evidence, Misc Confirmation, Misc Induction, Misc Decision Theory and Hypothesis Testing General Relativity
225

Did Pearson reject the Neyman-Pearson philosophy of statistics?
Synthese 90 (2). 1992.

I document some of the main evidence showing that E. S. Pearson rejected the key features of the behavioral-decision philosophy that became associated with the Neyman-Pearson Theory of statistics (NPT). I argue that NPT principles arose not out of behavioral aims, where the concern is solely with behaving correctly sufficiently often in some long run, but out of the epistemological aim of learning about causes of experimental results (e.g., distinguishing genuine from spurious effects). The view…Read more
I document some of the main evidence showing that E. S. Pearson rejected the key features of the behavioral-decision philosophy that became associated with the Neyman-Pearson Theory of statistics (NPT). I argue that NPT principles arose not out of behavioral aims, where the concern is solely with behaving correctly sufficiently often in some long run, but out of the epistemological aim of learning about causes of experimental results (e.g., distinguishing genuine from spurious effects). The view Pearson did hold gives a deeper understanding of NPT tests than their typical formulation as accept-reject routines, against which criticisms of NPT are really directed. The Pearsonian view that emerges suggests how NPT tests may avoid these criticisms while still retaining what is central to these methods: the control of error probabilities.

Philosophy of Statistics
105

The Philosophical Relevance of Statistics
PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1980. 1980.

While philosophers have studied probability and induction, statistics has not received the kind of philosophical attention mathematics and physics have. Despite increasing use of statistics in science, statistical advances have been little noted in the philosophy of science literature. This paper shows the relevance of statistics to both theoretical and applied problems of philosophy. It begins by discussing the relevance of statistics to the problem of induction and then discusses the reasoning…Read more
While philosophers have studied probability and induction, statistics has not received the kind of philosophical attention mathematics and physics have. Despite increasing use of statistics in science, statistical advances have been little noted in the philosophy of science literature. This paper shows the relevance of statistics to both theoretical and applied problems of philosophy. It begins by discussing the relevance of statistics to the problem of induction and then discusses the reasoning that leads to causal generalizations and how statistics elucidates the structure of science as it is actually practiced. In addition to being relevant for building an adequate theory of scientific inference, it is argued that statistics provides a link between philosophy, science and public policy.

Bayesian Reasoning
142

Error and the Growth of Experimental Knowledge
with Michael Kruse

Philosophical Review 107 (2): 324. 1998.

Once upon a time, logic was the philosopher’s tool for analyzing scientific reasoning. Nowadays, probability and statistics have largely replaced logic, and their most popular application—Bayesianism—has replaced the qualitative deductive relationship between a hypothesis h and evidence e with a quantitative measure of h’s probability in light of e.

Varieties of Confirmation
649

Severe testing as a basic concept in a neyman–pearson philosophy of induction
with Aris Spanos

British Journal for the Philosophy of Science 57 (2): 323-357. 2006.

Despite the widespread use of key concepts of the Neyman–Pearson (N–P) statistical paradigm—type I and II errors, significance levels, power, confidence levels—they have been the subject of philosophical controversy and debate for over 60 years. Both current and long-standing problems of N–P tests stem from unclarity and confusion, even among N–P adherents, as to how a test's (pre-data) error probabilities are to be used for (post-data) inductive inference as opposed to inductive behavior. We ar…Read more
Despite the widespread use of key concepts of the Neyman–Pearson (N–P) statistical paradigm—type I and II errors, significance levels, power, confidence levels—they have been the subject of philosophical controversy and debate for over 60 years. Both current and long-standing problems of N–P tests stem from unclarity and confusion, even among N–P adherents, as to how a test's (pre-data) error probabilities are to be used for (post-data) inductive inference as opposed to inductive behavior. We argue that the relevance of error probabilities is to ensure that only statistical hypotheses that have passed severe or probative tests are inferred from the data. The severity criterion supplies a meta-statistical principle for evaluating proposed statistical inferences, avoiding classic fallacies from tests that are overly sensitive, as well as those not sensitive enough to particular errors and discrepancies. Introduction and overview 1.1 Behavioristic and inferential rationales for Neyman–Pearson (N–P) tests 1.2 Severity rationale: induction as severe testing 1.3 Severity as a meta-statistical concept: three required restrictions on the N–P paradigm Error statistical tests from the severity perspective 2.1 N–P test T(): type I, II error probabilities and power 2.2 Specifying test T() using p-values Neyman's post-data use of power 3.1 Neyman: does failure to reject H warrant confirming H? Severe testing as a basic concept for an adequate post-data inference 4.1 The severity interpretation of acceptance (SIA) for test T() 4.2 The fallacy of acceptance (i.e., an insignificant difference): Ms Rosy 4.3 Severity and power Fallacy of rejection: statistical vs. substantive significance 5.1 Taking a rejection of H0 as evidence for a substantive claim or theory 5.2 A statistically significant difference from H0 may fail to indicate a substantively important magnitude 5.3 Principle for the severity interpretation of a rejection (SIR) 5.4 Comparing significant results with different sample sizes in T(): large n problem 5.5 General testing rules for T(), using the severe testing concept The severe testing concept and confidence intervals 6.1 Dualities between one and two-sided intervals and tests 6.2 Avoiding shortcomings of confidence intervals Beyond the N–P paradigm: pure significance, and misspecification tests Concluding comments: have we shown severity to be a basic concept in a N–P philosophy of induction?

Decision Theory and Hypothesis Testing Philosophy of Statistics Induction, Misc Confirmation, Misc Stati…Read more
Decision Theory and Hypothesis Testing Philosophy of Statistics Induction, Misc Confirmation, Misc Statistics Interpretation of Probability

Prev.
1
2
3
Next

Deborah Mayo

NewPerspectiveson (SomeOld) Problems of Frequentist Statistics
with David Cox

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 247. 2009.

Frequentist statistics as a theory of inductive inference
with David Cox

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. 2009.

Objectivity and conditionality in frequentist inference
with David Cox

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 276. 2009.

Ontology & Methodology
with Benjamin C. Jantzen and Lydia Patton

Synthese 192 (11): 3413-3423. 2015.

Severe Testing: Error Statistics versus Bayes Factor Tests
British Journal for the Philosophy of Science. forthcoming.

Duhem's problem, the bayesian way, and error statistics, or "what's belief got to do with it?"
Philosophy of Science 64 (2): 222-244. 1997.

'Peirce-pectives' on Metaphysics and the Sciences
with Susan Haack, Rosa Mayorga, Jaime Nubiola, Cornelis de Waal, Robert G. Meyers, Joseph C. Pitt, and Nicholas Rescher

Transactions of the Charles S. Peirce Society 41 (2): 237-365. 2005.

Science, Error Statistics, and Arguing from Error
Poznan Studies in the Philosophy of the Sciences and the Humanities 71 95-111. 2000.

Toward a More Objective Understanding of the Evidence of Carcinogenic Risk
PSA Proceedings of the Biennial Meeting of the Philosophy of Science Association 1988 (2): 489-503. 1988.

Error and the Growth of Experimental Knowledge
University of Chicago. 1996.

Cartwright, Causality, and Coincidence
PSA Proceedings of the Biennial Meeting of the Philosophy of Science Association 1986 (1): 42-58. 1986.

Error, tests and theory confirmation
with Aris Spanos

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 125-154. 2009.

Some surprising facts about surprising facts
Studies in History and Philosophy of Science Part A 45 79-86. 2014.

Statistical significance and its critics: practicing damaging science, or damaging scientific practice?
with David Hand

Synthese 200 (3): 1-33. 2022.

How to discount double-counting when it counts: Some clarifications
British Journal for the Philosophy of Science 59 (4): 857-879. 2008.

Introduction to recent issues in philosophy of statistics: evidence, testing, and applications
with Molly Kao and Elay Shech

Synthese 201 (4): 1-5. 2023.

Increasing Public Participation in Controversies Involving Hazards: The Value of Metastatistical Rules
Science, Technology and Human Values 10 (4): 55-65. 1985.

Causal Modeling, Explanation and Severe Testing
with Clark Glymour and Aris Spanos

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 331-375. 2009.

Significance Tests: Vitiated or Vindicated by the Replication Crisis in Psychology?
Review of Philosophy and Psychology 12 (1): 101-120. 2020.

Methodology in Practice: Statistical Misspecification Testing
with Aris Spanos

Philosophy of Science 71 (5): 1007-1025. 2004.

Acceptable Evidence (edited book)
with Rachelle D. Hollander

Oxford University Press USA. 1994.

About Thinking (review)
Teaching Philosophy 5 (1): 80-83. 1982.

Some methodological issues in experimental economics
Philosophy of Science 75 (5): 633-645. 2008.

Response to Howson and Laudan
Philosophy of Science 64 (2): 323-333. 1997.

Novel evidence and severe tests
Philosophy of Science 58 (4): 523-552. 1991.

Experimental practice and an error statistical account of evidence
Philosophy of Science 67 (3): 207. 2000.

Did Pearson reject the Neyman-Pearson philosophy of statistics?
Synthese 90 (2). 1992.

The Philosophical Relevance of Statistics
PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1980. 1980.

Error and the Growth of Experimental Knowledge
with Michael Kruse

Philosophical Review 107 (2): 324. 1998.

Severe testing as a basic concept in a neyman–pearson philosophy of induction
with Aris Spanos

British Journal for the Philosophy of Science 57 (2): 323-357. 2006.

Deborah Mayo

NewPerspectiveson (SomeOld) Problems of Frequentist Statistics with David Cox In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 247. 2009.

Frequentist statistics as a theory of inductive inference with David Cox In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. 2009.

Objectivity and conditionality in frequentist inference with David Cox In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 276. 2009.

Ontology & Methodology with Benjamin C. Jantzen and Lydia Patton Synthese 192 (11): 3413-3423. 2015.

Severe Testing: Error Statistics versus Bayes Factor Tests British Journal for the Philosophy of Science. forthcoming.

Duhem's problem, the bayesian way, and error statistics, or "what's belief got to do with it?" Philosophy of Science 64 (2): 222-244. 1997.

'Peirce-pectives' on Metaphysics and the Sciences with Susan Haack, Rosa Mayorga, Jaime Nubiola, Cornelis de Waal, Robert G. Meyers, Joseph C. Pitt, and Nicholas Rescher Transactions of the Charles S. Peirce Society 41 (2): 237-365. 2005.

Science, Error Statistics, and Arguing from Error Poznan Studies in the Philosophy of the Sciences and the Humanities 71 95-111. 2000.

Toward a More Objective Understanding of the Evidence of Carcinogenic Risk PSA Proceedings of the Biennial Meeting of the Philosophy of Science Association 1988 (2): 489-503. 1988.

Error and the Growth of Experimental Knowledge University of Chicago. 1996.

Cartwright, Causality, and Coincidence PSA Proceedings of the Biennial Meeting of the Philosophy of Science Association 1986 (1): 42-58. 1986.

Error, tests and theory confirmation with Aris Spanos In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 125-154. 2009.

Some surprising facts about surprising facts Studies in History and Philosophy of Science Part A 45 79-86. 2014.

Statistical significance and its critics: practicing damaging science, or damaging scientific practice? with David Hand Synthese 200 (3): 1-33. 2022.

How to discount double-counting when it counts: Some clarifications British Journal for the Philosophy of Science 59 (4): 857-879. 2008.

Introduction to recent issues in philosophy of statistics: evidence, testing, and applications with Molly Kao and Elay Shech Synthese 201 (4): 1-5. 2023.

Increasing Public Participation in Controversies Involving Hazards: The Value of Metastatistical Rules Science, Technology and Human Values 10 (4): 55-65. 1985.

Causal Modeling, Explanation and Severe Testing with Clark Glymour and Aris Spanos In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 331-375. 2009.

Significance Tests: Vitiated or Vindicated by the Replication Crisis in Psychology? Review of Philosophy and Psychology 12 (1): 101-120. 2020.

Methodology in Practice: Statistical Misspecification Testing with Aris Spanos Philosophy of Science 71 (5): 1007-1025. 2004.

Acceptable Evidence (edited book) with Rachelle D. Hollander Oxford University Press USA. 1994.

About Thinking (review) Teaching Philosophy 5 (1): 80-83. 1982.

Some methodological issues in experimental economics Philosophy of Science 75 (5): 633-645. 2008.

Response to Howson and Laudan Philosophy of Science 64 (2): 323-333. 1997.

Novel evidence and severe tests Philosophy of Science 58 (4): 523-552. 1991.

Experimental practice and an error statistical account of evidence Philosophy of Science 67 (3): 207. 2000.

Did Pearson reject the Neyman-Pearson philosophy of statistics? Synthese 90 (2). 1992.

The Philosophical Relevance of Statistics PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1980. 1980.

Error and the Growth of Experimental Knowledge with Michael Kruse Philosophical Review 107 (2): 324. 1998.

Severe testing as a basic concept in a neyman–pearson philosophy of induction with Aris Spanos British Journal for the Philosophy of Science 57 (2): 323-357. 2006.

NewPerspectiveson (SomeOld) Problems of Frequentist Statistics
with David Cox

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 247. 2009.

Frequentist statistics as a theory of inductive inference
with David Cox

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. 2009.

Objectivity and conditionality in frequentist inference
with David Cox

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 276. 2009.

Ontology & Methodology
with Benjamin C. Jantzen and Lydia Patton

Synthese 192 (11): 3413-3423. 2015.

Severe Testing: Error Statistics versus Bayes Factor Tests
British Journal for the Philosophy of Science. forthcoming.

Duhem's problem, the bayesian way, and error statistics, or "what's belief got to do with it?"
Philosophy of Science 64 (2): 222-244. 1997.

'Peirce-pectives' on Metaphysics and the Sciences
with Susan Haack, Rosa Mayorga, Jaime Nubiola, Cornelis de Waal, Robert G. Meyers, Joseph C. Pitt, and Nicholas Rescher

Transactions of the Charles S. Peirce Society 41 (2): 237-365. 2005.

Science, Error Statistics, and Arguing from Error
Poznan Studies in the Philosophy of the Sciences and the Humanities 71 95-111. 2000.

Toward a More Objective Understanding of the Evidence of Carcinogenic Risk
PSA Proceedings of the Biennial Meeting of the Philosophy of Science Association 1988 (2): 489-503. 1988.

Error and the Growth of Experimental Knowledge
University of Chicago. 1996.

Cartwright, Causality, and Coincidence
PSA Proceedings of the Biennial Meeting of the Philosophy of Science Association 1986 (1): 42-58. 1986.

Error, tests and theory confirmation
with Aris Spanos

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 125-154. 2009.

Some surprising facts about surprising facts
Studies in History and Philosophy of Science Part A 45 79-86. 2014.

Statistical significance and its critics: practicing damaging science, or damaging scientific practice?
with David Hand

Synthese 200 (3): 1-33. 2022.

How to discount double-counting when it counts: Some clarifications
British Journal for the Philosophy of Science 59 (4): 857-879. 2008.

Introduction to recent issues in philosophy of statistics: evidence, testing, and applications
with Molly Kao and Elay Shech

Synthese 201 (4): 1-5. 2023.

Increasing Public Participation in Controversies Involving Hazards: The Value of Metastatistical Rules
Science, Technology and Human Values 10 (4): 55-65. 1985.

Causal Modeling, Explanation and Severe Testing
with Clark Glymour and Aris Spanos

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 331-375. 2009.

Significance Tests: Vitiated or Vindicated by the Replication Crisis in Psychology?
Review of Philosophy and Psychology 12 (1): 101-120. 2020.

Methodology in Practice: Statistical Misspecification Testing
with Aris Spanos

Philosophy of Science 71 (5): 1007-1025. 2004.

Acceptable Evidence (edited book)
with Rachelle D. Hollander

Oxford University Press USA. 1994.

About Thinking (review)
Teaching Philosophy 5 (1): 80-83. 1982.

Some methodological issues in experimental economics
Philosophy of Science 75 (5): 633-645. 2008.

Response to Howson and Laudan
Philosophy of Science 64 (2): 323-333. 1997.

Novel evidence and severe tests
Philosophy of Science 58 (4): 523-552. 1991.

Experimental practice and an error statistical account of evidence
Philosophy of Science 67 (3): 207. 2000.

Did Pearson reject the Neyman-Pearson philosophy of statistics?
Synthese 90 (2). 1992.

The Philosophical Relevance of Statistics
PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1980. 1980.

Error and the Growth of Experimental Knowledge
with Michael Kruse

Philosophical Review 107 (2): 324. 1998.

Severe testing as a basic concept in a neyman–pearson philosophy of induction
with Aris Spanos

British Journal for the Philosophy of Science 57 (2): 323-357. 2006.