Mayank Kejriwal (University of Southern California): Publications

More details

University of Southern California
Industrial & Systems Engineering

Associate Professor

University of Texas at Austin

Computer Science

PhD, 2016

Homepage

Los Angeles, California, United States of America

0000-0001-5988-8305

Areas of Specialization

Philosophy of Artificial Intelligence

Areas of Interest

Philosophy of Artificial Intelligence

201

Epistemic Misalignment in Human-AI Systems: A Four-Quadrant Taxonomy of Uncertainty

Despite significant progress, the AI alignment literature has largely overlooked a philosophical dilemma: \textit{humans and machines represent and communicate uncertainty in fundamentally incompatible ways.} To formalize the arguments around this dilemma, we introduce a four-quadrant taxonomy of uncertainty that partitions uncertainty according to two dimensions: whether uncertainty arises from a human or machine agent, and whether it remains internal or is communicated externally. The taxonomy…Read more
Despite significant progress, the AI alignment literature has largely overlooked a philosophical dilemma: \textit{humans and machines represent and communicate uncertainty in fundamentally incompatible ways.} To formalize the arguments around this dilemma, we introduce a four-quadrant taxonomy of uncertainty that partitions uncertainty according to two dimensions: whether uncertainty arises from a human or machine agent, and whether it remains internal or is communicated externally. The taxonomy comprises \textit{human inherent uncertainty} (Bayesian credence), \textit{human self-reported uncertainty} (linguistic hedges), \textit{model inherent uncertainty} (frequentist risk), and \textit{model self-reported uncertainty} (preference-optimized language). We show that this taxonomy reveals an epistemological gap: humans assign uncertainty through grounded, causal understanding, while models assign it through risk-minimization over training data. When models trained with RLHF generate hedged language, humans interpret it as Bayesian belief vis-à-vis frequentist risk-penalty. We close with a practical case study and argue that bridging this epistemic gap is essential - but unfortunately neglected in current discourse - for establishing trust in complex human-AI collaboration modes.

Philosophy of Language Other Academic Areas, Misc Philosophy, Miscellaneous
173

Reference extended: how abstract singular terms refer

How do abstract singular terms like "the Pythagoras Theorem" refer? While theories of reference have been extensively developed for concrete singular terms (proper names, definite descriptions), the question of reference to abstract objects remains under-explored. This essay argues that abstract singular terms do refer, obeying similar principles as concrete singular terms while presenting distinct philosophical challenges. We examine three prominent theories of reference—Millianism, the Frege-R…Read more
How do abstract singular terms like "the Pythagoras Theorem" refer? While theories of reference have been extensively developed for concrete singular terms (proper names, definite descriptions), the question of reference to abstract objects remains under-explored. This essay argues that abstract singular terms do refer, obeying similar principles as concrete singular terms while presenting distinct philosophical challenges. We examine three prominent theories of reference—Millianism, the Frege-Russell Theory, and Kripke's causal theory—and demonstrate that each faces serious difficulties when extended to abstract entities. Millianism requires commitment to mathematical Platonism; descriptivism struggles with reference-fixing and the problem of degraded understanding; pure causal theory fails to account for variation in speakers' conceptions of abstract terms. We propose that a causal-descriptive hybrid theory, wherein causation plays a constitutive role in the sense of abstract singular terms, best explains how speakers can meaningfully refer to abstract objects despite incomplete or evolving understanding. This account illuminates how ordinary language users refer to mathematical theorems without explicit knowledge of their formal content, suggesting that causation is not merely a supplementary feature of reference but essential to understanding how abstract singular terms function in discourse.

Theories of Reference, Misc
236

Beyond Catastrophe: Why AI Longtermism Must Account for Uber-Beneficence

In "Is Power-Seeking AI an Existential Risk?" Joe Carlsmith provides an intriguing analysis of the potential pathways from advanced artificial intelligence to existential catastrophe. This essay contends that while the report's framework offers a fundamental contribution to AI safety, its application of longtermist principles is incomplete. By focusing almost exclusively on catastrophic failure modes, it overlooks the symmetrical potential for "uber-beneficence" i.e., outcomes of profound and du…Read more
In "Is Power-Seeking AI an Existential Risk?" Joe Carlsmith provides an intriguing analysis of the potential pathways from advanced artificial intelligence to existential catastrophe. This essay contends that while the report's framework offers a fundamental contribution to AI safety, its application of longtermist principles is incomplete. By focusing almost exclusively on catastrophic failure modes, it overlooks the symmetrical potential for "uber-beneficence" i.e., outcomes of profound and durable positive value. This critique deconstructs Carlsmith's model of the power-seeking AI, arguing that it relies on a monolithic conception of agency that does not sufficiently appreciate insights from recent research into diverse agent architectures and emergent goals. Ultimately, a more complete longtermist strategy must not only build guardrails against catastrophe but also actively chart a course toward a flourishing, AI-assisted future. An overriding focus on downside risk alone may inadvertently preclude the most positive outcomes humanity could achieve.

Mayank Kejriwal

Epistemic Misalignment in Human-AI Systems: A Four-Quadrant Taxonomy of Uncertainty

Reference extended: how abstract singular terms refer

Beyond Catastrophe: Why AI Longtermism Must Account for Uber-Beneficence