Deborah Mayo (Virginia Tech): Publications

More details

Virginia Tech
Department of Philosophy

Professor Emeritus

Blacksburg, Virginia, United States of America

364

Severe testing as a basic concept in a neyman–pearson philosophy of induction
with Aris Spanos

British Journal for the Philosophy of Science 57 (2): 323-357. 2006.

Despite the widespread use of key concepts of the Neyman–Pearson (N–P) statistical paradigm—type I and II errors, significance levels, power, confidence levels—they have been the subject of philosophical controversy and debate for over 60 years. Both current and long-standing problems of N–P tests stem from unclarity and confusion, even among N–P adherents, as to how a test's (pre-data) error probabilities are to be used for (post-data) inductive inference as opposed to inductive behavior. We ar…Read more
Despite the widespread use of key concepts of the Neyman–Pearson (N–P) statistical paradigm—type I and II errors, significance levels, power, confidence levels—they have been the subject of philosophical controversy and debate for over 60 years. Both current and long-standing problems of N–P tests stem from unclarity and confusion, even among N–P adherents, as to how a test's (pre-data) error probabilities are to be used for (post-data) inductive inference as opposed to inductive behavior. We argue that the relevance of error probabilities is to ensure that only statistical hypotheses that have passed severe or probative tests are inferred from the data. The severity criterion supplies a meta-statistical principle for evaluating proposed statistical inferences, avoiding classic fallacies from tests that are overly sensitive, as well as those not sensitive enough to particular errors and discrepancies. Introduction and overview 1.1 Behavioristic and inferential rationales for Neyman–Pearson (N–P) tests 1.2 Severity rationale: induction as severe testing 1.3 Severity as a meta-statistical concept: three required restrictions on the N–P paradigm Error statistical tests from the severity perspective 2.1 N–P test T(): type I, II error probabilities and power 2.2 Specifying test T() using p-values Neyman's post-data use of power 3.1 Neyman: does failure to reject H warrant confirming H? Severe testing as a basic concept for an adequate post-data inference 4.1 The severity interpretation of acceptance (SIA) for test T() 4.2 The fallacy of acceptance (i.e., an insignificant difference): Ms Rosy 4.3 Severity and power Fallacy of rejection: statistical vs. substantive significance 5.1 Taking a rejection of H0 as evidence for a substantive claim or theory 5.2 A statistically significant difference from H0 may fail to indicate a substantively important magnitude 5.3 Principle for the severity interpretation of a rejection (SIR) 5.4 Comparing significant results with different sample sizes in T(): large n problem 5.5 General testing rules for T(), using the severe testing concept The severe testing concept and confidence intervals 6.1 Dualities between one and two-sided intervals and tests 6.2 Avoiding shortcomings of confidence intervals Beyond the N–P paradigm: pure significance, and misspecification tests Concluding comments: have we shown severity to be a basic concept in a N–P philosophy of induction?

Decision Theory and Hypothesis Testing Philosophy of Statistics Induction, Misc Confirmation, Misc Stati…Read more
Decision Theory and Hypothesis Testing Philosophy of Statistics Induction, Misc Confirmation, Misc Statistics Interpretation of Probability
324

Experimental practice and an error statistical account of evidence
Philosophy of Science 67 (3): 207. 2000.

In seeking general accounts of evidence, confirmation, or inference, philosophers have looked to logical relationships between evidence and hypotheses. Such logics of evidential relationship, whether hypothetico-deductive, Bayesian, or instantiationist fail to capture or be relevant to scientific practice. They require information that scientists do not generally have (e.g., an exhaustive set of hypotheses), while lacking slots within which to include considerations to which scientists regularly…Read more
In seeking general accounts of evidence, confirmation, or inference, philosophers have looked to logical relationships between evidence and hypotheses. Such logics of evidential relationship, whether hypothetico-deductive, Bayesian, or instantiationist fail to capture or be relevant to scientific practice. They require information that scientists do not generally have (e.g., an exhaustive set of hypotheses), while lacking slots within which to include considerations to which scientists regularly appeal (e.g., error probabilities). Building on my co-symposiasts contributions, I suggest some directions in which a new and more adequate philosophy of evidence can move

Experimentation in Science Philosophy of Statistics Falsification Evidence, Misc Confirmation, Misc Induc…Read more
Experimentation in Science Philosophy of Statistics Falsification Evidence, Misc Confirmation, Misc Induction, Misc Decision Theory and Hypothesis Testing General Relativity
249

Ducks, Rabbits, and Normal Science: Recasting the Kuhn’s-Eye View of Popper’s Demarcation of Science
British Journal for the Philosophy of Science 47 (2): 271-290. 1996.

Kuhn maintains that what marks the transition to a science is the ability to carry out ‘normal’ science—a practice he characterizes as abandoning the kind of testing that Popper lauds as the hallmark of science. Examining Kuhn's own contrast with Popper, I propose to recast Kuhnian normal science. Thus recast, it is seen to consist of severe and reliable tests of low-level experimental hypotheses (normal tests) and is, indeed, the place to look to demarcate science. While thereby vindicating Kuh…Read more
Kuhn maintains that what marks the transition to a science is the ability to carry out ‘normal’ science—a practice he characterizes as abandoning the kind of testing that Popper lauds as the hallmark of science. Examining Kuhn's own contrast with Popper, I propose to recast Kuhnian normal science. Thus recast, it is seen to consist of severe and reliable tests of low-level experimental hypotheses (normal tests) and is, indeed, the place to look to demarcate science. While thereby vindicating Kuhn on demarcation, my recasting of normal science is seen to tell against Kuhn's view of revolutionary science.

Demarcation of Science Thomas Kuhn Popper: Demarcation of Science
230

Behavioristic, evidentialist, and learning models of statistical testing
Philosophy of Science 52 (4): 493-516. 1985.

While orthodox (Neyman-Pearson) statistical tests enjoy widespread use in science, the philosophical controversy over their appropriateness for obtaining scientific knowledge remains unresolved. I shall suggest an explanation and a resolution of this controversy. The source of the controversy, I argue, is that orthodox tests are typically interpreted as rules for making optimal decisions as to how to behave--where optimality is measured by the frequency of errors the test would commit in a long …Read more
While orthodox (Neyman-Pearson) statistical tests enjoy widespread use in science, the philosophical controversy over their appropriateness for obtaining scientific knowledge remains unresolved. I shall suggest an explanation and a resolution of this controversy. The source of the controversy, I argue, is that orthodox tests are typically interpreted as rules for making optimal decisions as to how to behave--where optimality is measured by the frequency of errors the test would commit in a long series of trials. Most philosophers of statistics, however, view the task of statistical methods as providing appropriate measures of the evidential-strength that data affords hypotheses. Since tests appropriate for the behavioral-decision task fail to provide measures of evidential-strength, philosophers of statistics claim the use of orthodox tests in science is misleading and unjustified. What critics of orthodox tests overlook, I argue, is that the primary function of statistical tests in science is neither to decide how to behave nor to assign measures of evidential strength to hypotheses. Rather, tests provide a tool for using incomplete data to learn about the process that generated it. This they do, I show, by providing a standard for distinguishing differences (between observed and hypothesized results) due to accidental or trivial errors from those due to systematic or substantively important discrepancies. I propose a reinterpretation of a commonly used orthodox test to make this learning model of tests explicit

Bayesian Reasoning Philosophy of Statistics Decision Theory and Hypothesis Testing Confirmation, Misc Ev…Read more
Bayesian Reasoning Philosophy of Statistics Decision Theory and Hypothesis Testing Confirmation, Misc Evidence, Misc Learning, Misc
223

How everyone can have a rare property: Response to Sober on frequency-dependent causation
Philosophy of Science 54 (2): 266-276. 1987.

In a recent discussion note Sober (1985) elaborates on the argument given in Sober (1982) to show the inadequacy of Ronald Giere's (1979, 1980) causal model for cases of frequency-dependent causation, and denies that Giere's (1984) response avoids the problem he raises. I argue that frequency-dependent effects do not pose a problem for Giere's original causal model, and that all parties in this dispute have been guity of misinterpreting the counterfactual populations involved in applying Giere's…Read more
In a recent discussion note Sober (1985) elaborates on the argument given in Sober (1982) to show the inadequacy of Ronald Giere's (1979, 1980) causal model for cases of frequency-dependent causation, and denies that Giere's (1984) response avoids the problem he raises. I argue that frequency-dependent effects do not pose a problem for Giere's original causal model, and that all parties in this dispute have been guity of misinterpreting the counterfactual populations involved in applying Giere's model

Causal Reasoning, Misc Causation in Biology
164

Did Pearson reject the Neyman-Pearson philosophy of statistics?
Synthese 90 (2). 1992.

I document some of the main evidence showing that E. S. Pearson rejected the key features of the behavioral-decision philosophy that became associated with the Neyman-Pearson Theory of statistics (NPT). I argue that NPT principles arose not out of behavioral aims, where the concern is solely with behaving correctly sufficiently often in some long run, but out of the epistemological aim of learning about causes of experimental results (e.g., distinguishing genuine from spurious effects). The view…Read more
I document some of the main evidence showing that E. S. Pearson rejected the key features of the behavioral-decision philosophy that became associated with the Neyman-Pearson Theory of statistics (NPT). I argue that NPT principles arose not out of behavioral aims, where the concern is solely with behaving correctly sufficiently often in some long run, but out of the epistemological aim of learning about causes of experimental results (e.g., distinguishing genuine from spurious effects). The view Pearson did hold gives a deeper understanding of NPT tests than their typical formulation as accept-reject routines, against which criticisms of NPT are really directed. The Pearsonian view that emerges suggests how NPT tests may avoid these criticisms while still retaining what is central to these methods: the control of error probabilities.

Philosophy of Statistics
131

Methodology in Practice: Statistical Misspecification Testing
with Aris Spanos

Philosophy of Science 71 (5): 1007-1025. 2004.

The growing availability of computer power and statistical software has greatly increased the ease with which practitioners apply statistical methods, but this has not been accompanied by attention to checking the assumptions on which these methods are based. At the same time, disagreements about inferences based on statistical research frequently revolve around whether the assumptions are actually met in the studies available, e.g., in psychology, ecology, biology, risk assessment. Philosophica…Read more
The growing availability of computer power and statistical software has greatly increased the ease with which practitioners apply statistical methods, but this has not been accompanied by attention to checking the assumptions on which these methods are based. At the same time, disagreements about inferences based on statistical research frequently revolve around whether the assumptions are actually met in the studies available, e.g., in psychology, ecology, biology, risk assessment. Philosophical scrutiny can help disentangle 'practical' problems of model validation, and conversely, a methodology of statistical model validation can shed light on a number of issues of interest to philosophers of science

Scientific Practice Philosophy of Statistics Modeling Practices Varieties of Confirmation Probabilistic …Read more
Scientific Practice Philosophy of Statistics Modeling Practices Varieties of Confirmation Probabilistic Reasoning
130

Models of group selection
with Norman L. Gilinsky

Philosophy of Science 54 (4): 515-538. 1987.

The key problem in the controversy over group selection is that of defining a criterion of group selection that identifies a distinct causal process that is irreducible to the causal process of individual selection. We aim to clarify this problem and to formulate an adequate model of irreducible group selection. We distinguish two types of group selection models, labeling them type I and type II models. Type I models are invoked to explain differences among groups in their respective rates of pr…Read more
The key problem in the controversy over group selection is that of defining a criterion of group selection that identifies a distinct causal process that is irreducible to the causal process of individual selection. We aim to clarify this problem and to formulate an adequate model of irreducible group selection. We distinguish two types of group selection models, labeling them type I and type II models. Type I models are invoked to explain differences among groups in their respective rates of production of contained individuals. Type II models are invoked to explain differences among groups in their respective rates of production of distinct new groups. Taking Elliott Sober's model as an exemplar, we argue that although type I models have some biological importance--they force biologists to consider the role of group properties in influencing the fitness of organisms--they fail to identify a distinct group-level causal selection process. Type II models if properly framed, however, do identify a group-level causal selection process that is not reducible to individual selection. We propose such a type II model and apply it to some of the major candidates for group selection

Group Selection
124

Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science (edited book)
with Aris Spanos

Cambridge University Press. 2009.

Although both philosophers and scientists are interested in how to obtain reliable knowledge in the face of error, there is a gap between their perspectives that has been an obstacle to progress. By means of a series of exchanges between the editors and leaders from the philosophy of science, statistics and economics, this volume offers a cumulative introduction connecting problems of traditional philosophy of science to problems of inference in statistical and empirical modelling practice. Phil…Read more
Although both philosophers and scientists are interested in how to obtain reliable knowledge in the face of error, there is a gap between their perspectives that has been an obstacle to progress. By means of a series of exchanges between the editors and leaders from the philosophy of science, statistics and economics, this volume offers a cumulative introduction connecting problems of traditional philosophy of science to problems of inference in statistical and empirical modelling practice. Philosophers of science and scientific practitioners are challenged to reevaluate the assumptions of their own theories - philosophical or methodological. Practitioners may better appreciate the foundational issues around which their questions revolve and thereby become better 'applied philosophers'. Conversely, new avenues emerge for finally solving recalcitrant philosophical problems of induction, explanation and theory testing.

Rationality Empirical Testing in Economics Philosophy of Statistics Theories and Models, Misc Evidence, …Read more
Rationality Empirical Testing in Economics Philosophy of Statistics Theories and Models, Misc Evidence, Misc Falsification Philosophy of Science, General Works Experimentation in Science Confirmation, Misc Scientific Method, Miscellaneous
121

Ontology & Methodology
with Benjamin C. Jantzen and Lydia Patton

Synthese 192 (11): 3413-3423. 2015.

Philosophers of science have long been concerned with the question of what a given scientific theory tells us about the contents of the world, but relatively little attention has been paid to how we set out to build theories and to the relevance of pre-theoretical methodology on a theory’s interpretation. In the traditional view, the form and content of a mature theory can be separated from any tentative ontological assumptions that went into its development. For this reason, the target of inter…Read more
Philosophers of science have long been concerned with the question of what a given scientific theory tells us about the contents of the world, but relatively little attention has been paid to how we set out to build theories and to the relevance of pre-theoretical methodology on a theory’s interpretation. In the traditional view, the form and content of a mature theory can be separated from any tentative ontological assumptions that went into its development. For this reason, the target of interpretation is taken to be the mature theory and nothing more. On this view, positions on ontology emerge only once a theory is to hand, not as part of the process of theory building. Serious attention to theory creation suggests this is too simple. In particular, data collection and experimentation are influenced both by theory and by assumptions about the entities thought to be the target of study. Initial reasoning about possible ontologies has an influence on the choice of theoretical variables as well as on the judgments of the methodology appropriate to investigate them.

Convergent Realism
116

Novel evidence and severe tests
Philosophy of Science 58 (4): 523-552. 1991.

While many philosophers of science have accorded special evidential significance to tests whose results are "novel facts", there continues to be disagreement over both the definition of novelty and why it should matter. The view of novelty favored by Giere, Lakatos, Worrall and many others is that of use-novelty: An accordance between evidence e and hypothesis h provides a genuine test of h only if e is not used in h's construction. I argue that what lies behind the intuition that novelty matter…Read more
While many philosophers of science have accorded special evidential significance to tests whose results are "novel facts", there continues to be disagreement over both the definition of novelty and why it should matter. The view of novelty favored by Giere, Lakatos, Worrall and many others is that of use-novelty: An accordance between evidence e and hypothesis h provides a genuine test of h only if e is not used in h's construction. I argue that what lies behind the intuition that novelty matters is the deeper intuition that severe tests matter. I set out a criterion of severity akin to the notion of a test's power in Neyman-Pearson statistics. I argue that tests which are use-novel may fail to be severe, and tests that are severe may fail to be use-novel. I discuss the 1919 eclipse data as a severe test of Einstein's law of gravity

Evidence, Misc Imre Lakatos
105

The New Experimentalism, Topical Hypotheses, and Learning from Error
PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1994 270-279. 1994.

An important theme to have emerged from the new experimentalist movement is that much of actual scientific practice deals not with appraising full-blown theories but with the manifold local tasks required to arrive at data, distinguish fact from artifact, and estimate backgrounds. Still, no program for working out a philosophy of experiment based on this recognition has been demarcated. I suggest why the new experimentalism has come up short, and propose a remedy appealing to the practice of sta…Read more
An important theme to have emerged from the new experimentalist movement is that much of actual scientific practice deals not with appraising full-blown theories but with the manifold local tasks required to arrive at data, distinguish fact from artifact, and estimate backgrounds. Still, no program for working out a philosophy of experiment based on this recognition has been demarcated. I suggest why the new experimentalism has come up short, and propose a remedy appealing to the practice of standard error statistics. I illustrate a portion of my proposal using Galison's experimental narrative on neutral currents

Formal Epistemology
98

The error statistical philosopher as normative naturalist
with Jean Miller

Synthese 163 (3). 2008.

We argue for a naturalistic account for appraising scientific methods that carries non-trivial normative force. We develop our approach by comparison with Laudan’s (American Philosophical Quarterly 24:19–31, 1987, Philosophy of Science 57:20–33, 1990) “normative naturalism” based on correlating means (various scientific methods) with ends (e.g., reliability). We argue that such a meta-methodology based on means–ends correlations is unreliable and cannot achieve its normative goals. We suggest an…Read more
We argue for a naturalistic account for appraising scientific methods that carries non-trivial normative force. We develop our approach by comparison with Laudan’s (American Philosophical Quarterly 24:19–31, 1987, Philosophy of Science 57:20–33, 1990) “normative naturalism” based on correlating means (various scientific methods) with ends (e.g., reliability). We argue that such a meta-methodology based on means–ends correlations is unreliable and cannot achieve its normative goals. We suggest another approach for meta-methodology based on a conglomeration of tools and strategies (from statistical modeling, experimental design, and related fields) that affords forward looking procedures for learning from error and for controlling error. The resulting “error statistical” appraisal is empirical—methods are appraised by examining their capacities to control error. At the same time, this account is normative, in that the strategies that pass muster are claims about how actually to proceed in given contexts to reach reliable inferences from limited data.

Scientific Metamethodology Naturalism Philosophy of Statistics Decision Theory and Hypothesis Testing Fa…Read more
Scientific Metamethodology Naturalism Philosophy of Statistics Decision Theory and Hypothesis Testing Falsification
93

Error statistical modeling and inference: Where methodology meets ontology
with Aris Spanos

Synthese 192 (11): 3533-3555. 2015.

In empirical modeling, an important desiderata for deeming theoretical entities and processes as real is that they can be reproducible in a statistical sense. Current day crises regarding replicability in science intertwines with the question of how statistical methods link data to statistical and substantive theories and models. Different answers to this question have important methodological consequences for inference, which are intertwined with a contrast between the ontological commitments o…Read more
In empirical modeling, an important desiderata for deeming theoretical entities and processes as real is that they can be reproducible in a statistical sense. Current day crises regarding replicability in science intertwines with the question of how statistical methods link data to statistical and substantive theories and models. Different answers to this question have important methodological consequences for inference, which are intertwined with a contrast between the ontological commitments of the two types of models. The key to untangling them is the realization that behind every substantive model there is a statistical model that pertains exclusively to the probabilistic assumptions imposed on the data. It is not that the methodology determines whether to be a realist about entities and processes in a substantive field. It is rather that the substantive and statistical models refer to different entities and processes, and therefore call for different criteria of adequacy.

Probabilistic Frameworks
92

Peircean Induction and the Error-Correcting Thesis
Transactions of the Charles S. Peirce Society 41 (2). 2005.

Charles Sanders Peirce
92

Duhem's problem, the bayesian way, and error statistics, or "what's belief got to do with it?"
Philosophy of Science 64 (2): 222-244. 1997.

I argue that the Bayesian Way of reconstructing Duhem's problem fails to advance a solution to the problem of which of a group of hypotheses ought to be rejected or "blamed" when experiment disagrees with prediction. But scientists do regularly tackle and often enough solve Duhemian problems. When they do, they employ a logic and methodology which may be called error statistics. I discuss the key properties of this approach which enable it to split off the task of testing auxiliary hypotheses fr…Read more
I argue that the Bayesian Way of reconstructing Duhem's problem fails to advance a solution to the problem of which of a group of hypotheses ought to be rejected or "blamed" when experiment disagrees with prediction. But scientists do regularly tackle and often enough solve Duhemian problems. When they do, they employ a logic and methodology which may be called error statistics. I discuss the key properties of this approach which enable it to split off the task of testing auxiliary hypotheses from that of appraising a primary hypothesis. By discriminating patterns of error, this approach can at least block, if not also severely test, attempted explanations of an anomaly. I illustrate how this approach directs progress with Duhemian problems and explains how scientists actually grapple with them

Bayesian Reasoning, Misc
80

Philosophical Scrutiny of Evidence of Risks: From Bioethics to Bioevidence
with Aris Spanos

Philosophy of Science 73 (5): 803-816. 2006.

We argue that a responsible analysis of today's evidence-based risk assessments and risk debates in biology demands a critical or metascientific scrutiny of the uncertainties, assumptions, and threats of error along the manifold steps in risk analysis. Without an accompanying methodological critique, neither sensitivity to social and ethical values, nor conceptual clarification alone, suffices. In this view, restricting the invitation for philosophical involvement to those wearing a "bioethicist…Read more
We argue that a responsible analysis of today's evidence-based risk assessments and risk debates in biology demands a critical or metascientific scrutiny of the uncertainties, assumptions, and threats of error along the manifold steps in risk analysis. Without an accompanying methodological critique, neither sensitivity to social and ethical values, nor conceptual clarification alone, suffices. In this view, restricting the invitation for philosophical involvement to those wearing a "bioethicist" label precludes the vitally important role philosophers of science may be able to play as bioevidentialists. The goal of this paper is to give a brief and partial sketch of how a metascientific scrutiny of risk evidence might work.

Medical Epistemology Biomedical Ethics Evolutionary Biology
76

Novel work on problems of novelty? Comments on Hudson
Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics 34 (1): 131-134. 2003.

Science, Logic, and Mathematics
73

An objective theory of statistical testing
Synthese 57 (3). 1983.

Theories of statistical testing may be seen as attempts to provide systematic means for evaluating scientific conjectures on the basis of incomplete or inaccurate observational data. The Neyman-Pearson Theory of Testing (NPT) has purported to provide an objective means for testing statistical hypotheses corresponding to scientific claims. Despite their widespread use in science, methods of NPT have themselves been accused of failing to be objective; and the purported objectivity of scientific cl…Read more
Theories of statistical testing may be seen as attempts to provide systematic means for evaluating scientific conjectures on the basis of incomplete or inaccurate observational data. The Neyman-Pearson Theory of Testing (NPT) has purported to provide an objective means for testing statistical hypotheses corresponding to scientific claims. Despite their widespread use in science, methods of NPT have themselves been accused of failing to be objective; and the purported objectivity of scientific claims based upon NPT has been called into question. The purpose of this paper is first to clarify this question by examining the conceptions of (I) the function served by NPT in science, and (II) the requirements of an objective theory of statistics upon which attacks on NPT's objectivity are based. Our grounds for rejecting these conceptions suggest altered conceptions of (I) and (II) that might avoid such attacks. Second, we propose a reformulation of NPT, denoted by NPT*, based on these altered conceptions, and argue that it provides an objective theory of statistics. The crux of our argument is that by being able to objectively control error frequencies NPT* is able to objectively evaluate what has or has not been learned from the result of a statistical test.

Confirmation
73

How to discount double-counting when it counts: Some clarifications
British Journal for the Philosophy of Science 59 (4): 857-879. 2008.

The issues of double-counting, use-constructing, and selection effects have long been the subject of debate in the philosophical as well as statistical literature. I have argued that it is the severity, stringency, or probativeness of the test—or lack of it—that should determine if a double-use of data is admissible. Hitchcock and Sober ([2004]) question whether this severity criterion' can perform its intended job. I argue that their criticisms stem from a flawed interpretation of the severity…Read more
The issues of double-counting, use-constructing, and selection effects have long been the subject of debate in the philosophical as well as statistical literature. I have argued that it is the severity, stringency, or probativeness of the test—or lack of it—that should determine if a double-use of data is admissible. Hitchcock and Sober ([2004]) question whether this severity criterion' can perform its intended job. I argue that their criticisms stem from a flawed interpretation of the severity criterion. Taking their criticism as a springboard, I elucidate some of the central examples that have long been controversial, and clarify how the severity criterion is properly applied to them. Severity and Use-Constructing: Four Points (and Some Clarificatory Notes) 1.1 Point 1: Getting beyond all or nothing standpoints 1.2 Point 2: The rationale for prohibiting double-counting is the requirement that tests be severe 1.3 Point 3: Evaluate severity of a test T by its associated construction rule R 1.4 Point 4: The ease of passing vs. ease of erroneous passing: Statistical vs. Definitional probability The False Dilemma: Hitchcock and Sober 2.1 Marsha measures her desk reliably 2.2 A false dilemma Canonical Errors of Inference 3.1 How construction rules may alter the error-probing performance of tests 3.2 Rules for accounting for anomalies 3.3 Hunting for statistically significant differences Concluding Remarks

Falsification Philosophy of Statistics Decision Theory and Hypothesis Testing Evidence, Misc Confirmatio…Read more
Falsification Philosophy of Statistics Decision Theory and Hypothesis Testing Evidence, Misc Confirmation, Misc
68

In defense of the Neyman-Pearson theory of confidence intervals
Philosophy of Science 48 (2): 269-280. 1981.

In Philosophical Problems of Statistical Inference, Seidenfeld argues that the Neyman-Pearson (NP) theory of confidence intervals is inadequate for a theory of inductive inference because, for a given situation, the 'best' NP confidence interval, [CIλ], sometimes yields intervals which are trivial (i.e., tautologous). I argue that (1) Seidenfeld's criticism of trivial intervals is based upon illegitimately interpreting confidence levels as measures of final precision; (2) for the situation which…Read more
In Philosophical Problems of Statistical Inference, Seidenfeld argues that the Neyman-Pearson (NP) theory of confidence intervals is inadequate for a theory of inductive inference because, for a given situation, the 'best' NP confidence interval, [CIλ], sometimes yields intervals which are trivial (i.e., tautologous). I argue that (1) Seidenfeld's criticism of trivial intervals is based upon illegitimately interpreting confidence levels as measures of final precision; (2) for the situation which Seidenfeld considers, the 'best' NP confidence interval is not [CIλ] as Seidenfeld suggests, but rather a one-sided interval [CI0]; and since [CI0] never yields trivial intervals, NP theory escapes Seidenfeld's criticism entirely; (3) Seidenfeld's criterion of non-triviality is inadequate, for it leads him to judge an alternative confidence interval, [CI alt. ], superior to [CIλ] although [CI alt. ] results in counterintuitive inferences. I conclude that Seidenfeld has not shown that the NP theory of confidence intervals is inadequate for a theory of inductive inference

Applications of Probability Bayesian Reasoning Decision Theory and Hypothesis Testing
66

Severe tests, arguing from error, and methodological underdetermination
Philosophical Studies 86 (3): 243-266. 1997.

Underdetermination of Theory by Data, Misc
65

Error and the Growth of Experimental Knowledge
with Michael Kruse

Philosophical Review 107 (2): 324. 1998.

Once upon a time, logic was the philosopher’s tool for analyzing scientific reasoning. Nowadays, probability and statistics have largely replaced logic, and their most popular application—Bayesianism—has replaced the qualitative deductive relationship between a hypothesis h and evidence e with a quantitative measure of h’s probability in light of e.

Varieties of Confirmation
63

Some methodological issues in experimental economics
Philosophy of Science 75 (5): 633-645. 2008.

The growing acceptance and success of experimental economics has increased the interest of researchers in tackling philosophical and methodological challenges to which their work increasingly gives rise. I sketch some general issues that call for the combined expertise of experimental economists and philosophers of science, of experiment, and of inductive‐statistical inference and modeling. †To contact the author, please write to: 235 Major Williams, Virginia Tech, Blacksburg, VA 24061‐0126; e‐m…Read more
The growing acceptance and success of experimental economics has increased the interest of researchers in tackling philosophical and methodological challenges to which their work increasingly gives rise. I sketch some general issues that call for the combined expertise of experimental economists and philosophers of science, of experiment, and of inductive‐statistical inference and modeling. †To contact the author, please write to: 235 Major Williams, Virginia Tech, Blacksburg, VA 24061‐0126; e‐mail: [email protected].

Experimental Economics
61

The Philosophical Relevance of Statistics
PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1980. 1980.

While philosophers have studied probability and induction, statistics has not received the kind of philosophical attention mathematics and physics have. Despite increasing use of statistics in science, statistical advances have been little noted in the philosophy of science literature. This paper shows the relevance of statistics to both theoretical and applied problems of philosophy. It begins by discussing the relevance of statistics to the problem of induction and then discusses the reasoning…Read more
While philosophers have studied probability and induction, statistics has not received the kind of philosophical attention mathematics and physics have. Despite increasing use of statistics in science, statistical advances have been little noted in the philosophy of science literature. This paper shows the relevance of statistics to both theoretical and applied problems of philosophy. It begins by discussing the relevance of statistics to the problem of induction and then discusses the reasoning that leads to causal generalizations and how statistics elucidates the structure of science as it is actually practiced. In addition to being relevant for building an adequate theory of scientific inference, it is argued that statistics provides a link between philosophy, science and public policy.

Bayesian Reasoning
57

Evidence as Passing Severe Tests: Highly Probable versus Highly Probed Hypotheses
In P. Achinstein (ed.), Scientific Evidence: Philosophical Theories & Applications, The Johns Hopkins University Press. pp. 95--128. 2005.

Confirmation Evidence, Misc Falsification Decision Theory and Hypothesis Testing Probabilistic Reasoning
56

What is this thing called philosophy of science?
with John Worrall, J. J. C. Smart, and Barry Barnes

Metascience 9 (2): 172-198. 2000.

General Philosophy of Science, Miscellaneous
56

Error and the growth of experimental knowledge
International Studies in the Philosophy of Science 15 (1): 455-459. 1996.

Science, Logic, and Mathematics Confirmation
55

Error statistics and learning from error: Making a virtue of necessity
Philosophy of Science 64 (4): 212. 1997.

The error statistical account of testing uses statistical considerations, not to provide a measure of probability of hypotheses, but to model patterns of irregularity that are useful for controlling, distinguishing, and learning from errors. The aim of this paper is (1) to explain the main points of contrast between the error statistical and the subjective Bayesian approach and (2) to elucidate the key errors that underlie the central objection raised by Colin Howson at our PSA 96 Symposium

Bayesian Reasoning, Misc
48

Objectivity and conditionality in frequentist inference
with David Cox

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 276. 2009.

Bayesian Reasoning

Prev.
1
2
3
Next

Deborah Mayo

Severe testing as a basic concept in a neyman–pearson philosophy of induction
with Aris Spanos

British Journal for the Philosophy of Science 57 (2): 323-357. 2006.

Experimental practice and an error statistical account of evidence
Philosophy of Science 67 (3): 207. 2000.

Ducks, Rabbits, and Normal Science: Recasting the Kuhn’s-Eye View of Popper’s Demarcation of Science
British Journal for the Philosophy of Science 47 (2): 271-290. 1996.

Behavioristic, evidentialist, and learning models of statistical testing
Philosophy of Science 52 (4): 493-516. 1985.

How everyone can have a rare property: Response to Sober on frequency-dependent causation
Philosophy of Science 54 (2): 266-276. 1987.

Did Pearson reject the Neyman-Pearson philosophy of statistics?
Synthese 90 (2). 1992.

Methodology in Practice: Statistical Misspecification Testing
with Aris Spanos

Philosophy of Science 71 (5): 1007-1025. 2004.

Models of group selection
with Norman L. Gilinsky

Philosophy of Science 54 (4): 515-538. 1987.

Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science (edited book)
with Aris Spanos

Cambridge University Press. 2009.

Ontology & Methodology
with Benjamin C. Jantzen and Lydia Patton

Synthese 192 (11): 3413-3423. 2015.

Novel evidence and severe tests
Philosophy of Science 58 (4): 523-552. 1991.

The New Experimentalism, Topical Hypotheses, and Learning from Error
PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1994 270-279. 1994.

The error statistical philosopher as normative naturalist
with Jean Miller

Synthese 163 (3). 2008.

Error statistical modeling and inference: Where methodology meets ontology
with Aris Spanos

Synthese 192 (11): 3533-3555. 2015.

Peircean Induction and the Error-Correcting Thesis
Transactions of the Charles S. Peirce Society 41 (2). 2005.

Duhem's problem, the bayesian way, and error statistics, or "what's belief got to do with it?"
Philosophy of Science 64 (2): 222-244. 1997.

Philosophical Scrutiny of Evidence of Risks: From Bioethics to Bioevidence
with Aris Spanos

Philosophy of Science 73 (5): 803-816. 2006.

Novel work on problems of novelty? Comments on Hudson
Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics 34 (1): 131-134. 2003.

An objective theory of statistical testing
Synthese 57 (3). 1983.

How to discount double-counting when it counts: Some clarifications
British Journal for the Philosophy of Science 59 (4): 857-879. 2008.

In defense of the Neyman-Pearson theory of confidence intervals
Philosophy of Science 48 (2): 269-280. 1981.

Severe tests, arguing from error, and methodological underdetermination
Philosophical Studies 86 (3): 243-266. 1997.

Error and the Growth of Experimental Knowledge
with Michael Kruse

Philosophical Review 107 (2): 324. 1998.

Some methodological issues in experimental economics
Philosophy of Science 75 (5): 633-645. 2008.

The Philosophical Relevance of Statistics
PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1980. 1980.

Evidence as Passing Severe Tests: Highly Probable versus Highly Probed Hypotheses
In P. Achinstein (ed.), Scientific Evidence: Philosophical Theories & Applications, The Johns Hopkins University Press. pp. 95--128. 2005.

What is this thing called philosophy of science?
with John Worrall, J. J. C. Smart, and Barry Barnes

Metascience 9 (2): 172-198. 2000.

Error and the growth of experimental knowledge
International Studies in the Philosophy of Science 15 (1): 455-459. 1996.

Error statistics and learning from error: Making a virtue of necessity
Philosophy of Science 64 (4): 212. 1997.

Objectivity and conditionality in frequentist inference
with David Cox

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 276. 2009.

Deborah Mayo

Severe testing as a basic concept in a neyman–pearson philosophy of induction with Aris Spanos British Journal for the Philosophy of Science 57 (2): 323-357. 2006.

Experimental practice and an error statistical account of evidence Philosophy of Science 67 (3): 207. 2000.

Ducks, Rabbits, and Normal Science: Recasting the Kuhn’s-Eye View of Popper’s Demarcation of Science British Journal for the Philosophy of Science 47 (2): 271-290. 1996.

Behavioristic, evidentialist, and learning models of statistical testing Philosophy of Science 52 (4): 493-516. 1985.

How everyone can have a rare property: Response to Sober on frequency-dependent causation Philosophy of Science 54 (2): 266-276. 1987.

Did Pearson reject the Neyman-Pearson philosophy of statistics? Synthese 90 (2). 1992.

Methodology in Practice: Statistical Misspecification Testing with Aris Spanos Philosophy of Science 71 (5): 1007-1025. 2004.

Models of group selection with Norman L. Gilinsky Philosophy of Science 54 (4): 515-538. 1987.

Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science (edited book) with Aris Spanos Cambridge University Press. 2009.

Ontology & Methodology with Benjamin C. Jantzen and Lydia Patton Synthese 192 (11): 3413-3423. 2015.

Novel evidence and severe tests Philosophy of Science 58 (4): 523-552. 1991.

The New Experimentalism, Topical Hypotheses, and Learning from Error PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1994 270-279. 1994.

The error statistical philosopher as normative naturalist with Jean Miller Synthese 163 (3). 2008.

Error statistical modeling and inference: Where methodology meets ontology with Aris Spanos Synthese 192 (11): 3533-3555. 2015.

Peircean Induction and the Error-Correcting Thesis Transactions of the Charles S. Peirce Society 41 (2). 2005.

Duhem's problem, the bayesian way, and error statistics, or "what's belief got to do with it?" Philosophy of Science 64 (2): 222-244. 1997.

Philosophical Scrutiny of Evidence of Risks: From Bioethics to Bioevidence with Aris Spanos Philosophy of Science 73 (5): 803-816. 2006.

Novel work on problems of novelty? Comments on Hudson Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics 34 (1): 131-134. 2003.

An objective theory of statistical testing Synthese 57 (3). 1983.

How to discount double-counting when it counts: Some clarifications British Journal for the Philosophy of Science 59 (4): 857-879. 2008.

In defense of the Neyman-Pearson theory of confidence intervals Philosophy of Science 48 (2): 269-280. 1981.

Severe tests, arguing from error, and methodological underdetermination Philosophical Studies 86 (3): 243-266. 1997.

Error and the Growth of Experimental Knowledge with Michael Kruse Philosophical Review 107 (2): 324. 1998.

Some methodological issues in experimental economics Philosophy of Science 75 (5): 633-645. 2008.

The Philosophical Relevance of Statistics PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1980. 1980.

Evidence as Passing Severe Tests: Highly Probable versus Highly Probed Hypotheses In P. Achinstein (ed.), Scientific Evidence: Philosophical Theories & Applications, The Johns Hopkins University Press. pp. 95--128. 2005.

What is this thing called philosophy of science? with John Worrall, J. J. C. Smart, and Barry Barnes Metascience 9 (2): 172-198. 2000.

Error and the growth of experimental knowledge International Studies in the Philosophy of Science 15 (1): 455-459. 1996.

Error statistics and learning from error: Making a virtue of necessity Philosophy of Science 64 (4): 212. 1997.

Objectivity and conditionality in frequentist inference with David Cox In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 276. 2009.

Severe testing as a basic concept in a neyman–pearson philosophy of induction
with Aris Spanos

British Journal for the Philosophy of Science 57 (2): 323-357. 2006.

Experimental practice and an error statistical account of evidence
Philosophy of Science 67 (3): 207. 2000.

Ducks, Rabbits, and Normal Science: Recasting the Kuhn’s-Eye View of Popper’s Demarcation of Science
British Journal for the Philosophy of Science 47 (2): 271-290. 1996.

Behavioristic, evidentialist, and learning models of statistical testing
Philosophy of Science 52 (4): 493-516. 1985.

How everyone can have a rare property: Response to Sober on frequency-dependent causation
Philosophy of Science 54 (2): 266-276. 1987.

Did Pearson reject the Neyman-Pearson philosophy of statistics?
Synthese 90 (2). 1992.

Methodology in Practice: Statistical Misspecification Testing
with Aris Spanos

Philosophy of Science 71 (5): 1007-1025. 2004.

Models of group selection
with Norman L. Gilinsky

Philosophy of Science 54 (4): 515-538. 1987.

Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science (edited book)
with Aris Spanos

Cambridge University Press. 2009.

Ontology & Methodology
with Benjamin C. Jantzen and Lydia Patton

Synthese 192 (11): 3413-3423. 2015.

Novel evidence and severe tests
Philosophy of Science 58 (4): 523-552. 1991.

The New Experimentalism, Topical Hypotheses, and Learning from Error
PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1994 270-279. 1994.

The error statistical philosopher as normative naturalist
with Jean Miller

Synthese 163 (3). 2008.

Error statistical modeling and inference: Where methodology meets ontology
with Aris Spanos

Synthese 192 (11): 3533-3555. 2015.

Peircean Induction and the Error-Correcting Thesis
Transactions of the Charles S. Peirce Society 41 (2). 2005.

Duhem's problem, the bayesian way, and error statistics, or "what's belief got to do with it?"
Philosophy of Science 64 (2): 222-244. 1997.

Philosophical Scrutiny of Evidence of Risks: From Bioethics to Bioevidence
with Aris Spanos

Philosophy of Science 73 (5): 803-816. 2006.

Novel work on problems of novelty? Comments on Hudson
Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics 34 (1): 131-134. 2003.

An objective theory of statistical testing
Synthese 57 (3). 1983.

How to discount double-counting when it counts: Some clarifications
British Journal for the Philosophy of Science 59 (4): 857-879. 2008.

In defense of the Neyman-Pearson theory of confidence intervals
Philosophy of Science 48 (2): 269-280. 1981.

Severe tests, arguing from error, and methodological underdetermination
Philosophical Studies 86 (3): 243-266. 1997.

Error and the Growth of Experimental Knowledge
with Michael Kruse

Philosophical Review 107 (2): 324. 1998.

Some methodological issues in experimental economics
Philosophy of Science 75 (5): 633-645. 2008.

The Philosophical Relevance of Statistics
PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1980. 1980.

Evidence as Passing Severe Tests: Highly Probable versus Highly Probed Hypotheses
In P. Achinstein (ed.), Scientific Evidence: Philosophical Theories & Applications, The Johns Hopkins University Press. pp. 95--128. 2005.

What is this thing called philosophy of science?
with John Worrall, J. J. C. Smart, and Barry Barnes

Metascience 9 (2): 172-198. 2000.

Error and the growth of experimental knowledge
International Studies in the Philosophy of Science 15 (1): 455-459. 1996.

Error statistics and learning from error: Making a virtue of necessity
Philosophy of Science 64 (4): 212. 1997.

Objectivity and conditionality in frequentist inference
with David Cox

In Deborah G. Mayo & Aris Spanos (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge University Press. pp. 276. 2009.