Bradley Allen (University of Amsterdam): Publications

More details

University of Amsterdam
Informatics Institute

Other

University of Amsterdam

PhD, 2026

Homepage

Amsterdam, North Holland, Netherlands

0000-0003-0216-3930

Areas of Specialization

Philosophy of Artificial Intelligence

Areas of Interest

Philosophy of Artificial Intelligence

Conceptual Engineering

Foundations of Experimental Philosophy

Knowledge

Social Epistemology

112

B_BL: A Bilateral Modal Logic for LLM Factuality Evaluation

We present B_BL, a nine-valued bilateral modal logic based on the bilattice NINE. We then construct a concrete computational model of BBL for factuality evaluation of large language models (LLMs). The model defines an accessibility relation representing meaning-preserving syntactic variation of queries asking an LLM to provide verification and refutation for a given proposition, and a valuation function grounded in the responses to such queries. An experimental study of the valuation function ac…Read more
We present B_BL, a nine-valued bilateral modal logic based on the bilattice NINE. We then construct a concrete computational model of BBL for factuality evaluation of large language models (LLMs). The model defines an accessibility relation representing meaning-preserving syntactic variation of queries asking an LLM to provide verification and refutation for a given proposition, and a valuation function grounded in the responses to such queries. An experimental study of the valuation function across four benchmarks with six LLMs shows bilateral valuation macro F1 outperforming that of binary, ternary, and confidence-based unilateral approaches to valuation, and providing interpretable information about LLM doxastic states.

Large Language Models Modal Logic Formal Semantics
350

Neurosymbolic Knowledge Engineering with Natural Language
Dissertation, University of Amsterdam. 2026.

Since the nineteen-seventies, knowledge engineering as a discipline has struggled with the implementation problem: the difficulty of translating expert knowledge expressed in natural language into a formal knowledge representation to be adopted by organizations and communities for use in automated decision making. This knowledge acquisition bottleneck remains a fundamental barrier. We argue that large language models (LLMs) provide a means to address the implementation problem, by allowing knowl…Read more
Since the nineteen-seventies, knowledge engineering as a discipline has struggled with the implementation problem: the difficulty of translating expert knowledge expressed in natural language into a formal knowledge representation to be adopted by organizations and communities for use in automated decision making. This knowledge acquisition bottleneck remains a fundamental barrier. We argue that large language models (LLMs) provide a means to address the implementation problem, by allowing knowledge expressed in natural language to be used directly in knowledge engineering tasks, rather than having to be first formalized into a knowledge representation language. We show how classifiers-as-intensions, based on the prompting of LLMs using natural language intensional definitions of concepts and relations, can provide support for the knowledge engineering task of classification. We show that by having classifiers-as-intensions provide rationales for their classifications, we can distinguish factual errors from disagreements about the meaning of concepts and relations, yielding actionable guidance for knowledge graph refinement. One objection to this approach is that LLMs exhibit hallucination in their output, bringing into question their factuality. To address this objection, we show that LLMs are capable of accurately detecting hallucination in language model output, and that bilateral factuality evaluation provides insight into the degree and scope of inconsistency and incompleteness in an LLM's parametric knowledge. We show how bilateral factuality evaluation can then be used in the formal semantics of a paraconsistent logic to allow sound and complete neurosymbolic reasoning using such knowledge. We conclude by arguing that the implementation problem in knowledge engineering is rooted in its adherence to representationalism, and that our findings suggest that inferentialism and social externalism provide a way to reconceptualize the practice of knowledge engineering and dissolve the implementation problem, not by making LLMs reason logically, but by using logics that allow reasoning with LLMs.

Philosophy of AI, General Works Computer Science Formal Semantics Large Language Models
278

Elenchus: Generating Knowledge Bases from Prover-Skeptic Dialogues

We present Elenchus, a dialogue system for knowledge base construction grounded in inferentialist semantics, where knowledge engineering is re-conceived as explicitation rather than extraction from expert testimony or textual content. A human expert develops a bilateral position (commitments and denials) about a topic through prover-skeptic dialogue with a large language model (LLM) opponent. The LLM proposes tensions (claims that parts of the position are jointly incoherent) which the expert re…Read more
We present Elenchus, a dialogue system for knowledge base construction grounded in inferentialist semantics, where knowledge engineering is re-conceived as explicitation rather than extraction from expert testimony or textual content. A human expert develops a bilateral position (commitments and denials) about a topic through prover-skeptic dialogue with a large language model (LLM) opponent. The LLM proposes tensions (claims that parts of the position are jointly incoherent) which the expert resolves by retraction, refinement, or contestation. The LLM thus serves as a defeasible derivability oracle whose unreliability is structurally contained by the expert's authority. Our main technical contribution is a mapping from Elenchus dialectical states to material bases in Hlobil and Brandom's NonMonotonic MultiSuccedent (NMMS) logic, satisfying Containment and enabling the elaboration of logical vocabulary that makes explicit the inferential relationships negotiated in the dialectic. We demonstrate the approach on the W3C PROV-O provenance ontology, where a single dialogue session elicits and structures design tensions that a domain expert can articulate, corresponding to decisions documented in a retrospective analysis of the ontology's design. Using pyNMMS, an automated NMMS reasoner, we verify that the structural properties of the resulting material base- nontransitivity, nonmonotonicity, and independence- correspond to specific PROV design rationales, demonstrating end-to-end integration from dialogue through formal reasoning.

Dialogue Inferentialist Accounts of Meaning and Content Logical Expressivism Large Language Models
478

Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations
with Prateek Chhikara, Thomas Macaulay Ferguson, Filip Ilievski, and Paul Groth

In Leilani Gilpin, Eleonora Giunchiglia, Pascal Hitzler & Emile van Krieken (eds.), Proceedings of 19th Conference on Neurosymbolic Learning and Reasoning, Proceedings of Machine Learning Research. forthcoming.

Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but they exhibit problems with logical consistency in the output they generate. How can we harness LLMs' broad-coverage parametric knowledge in formal reasoning despite their inconsistency? We present a method for directly integrating an LLM into the interpretation function of the formal semantics for a paraconsistent logic. We provide experimental evidence for the feasibility…Read more
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but they exhibit problems with logical consistency in the output they generate. How can we harness LLMs' broad-coverage parametric knowledge in formal reasoning despite their inconsistency? We present a method for directly integrating an LLM into the interpretation function of the formal semantics for a paraconsistent logic. We provide experimental evidence for the feasibility of the method by evaluating the function using datasets created from several short-form factuality benchmarks. Unlike prior work, our method offers a theoretical framework for neuro-symbolic reasoning that leverages an LLM's knowledge while preserving the underlying logic's soundness and completeness properties.

Philosophy of AI, General Works Large Language Models Natural Language Processing Formal Semantics
545

Large language models and the relative roles of formal and natural language in formalization

Formalizations serve as cognitive tools. By enabling algorithmic reasoning over sets of statements in a formal language, they provide a cognitive boost for human reasoners. We argue that the emergence of large language models (LLMs) as a technology for the analysis and generation of natural language provides a new perspective on the relative roles of formal and natural languages in formalization.

Natural Language Processing Formal Semantics Large Language Models Philosophy of AI, Misc
435

A Benchmark for the Detection of Metalinguistic Disagreements between LLMs and Knowledge Graphs
with Paul Groth

In Reham Alharbi, Jacopo de Berardinis, Paul Groth, Albert Meroño-Peñuela, Elena Simperl & Valentina Tamma (eds.), ISWC 2024 Special Session on Harmonising Generative AI and Semantic Web Technologies, Ceur-ws. forthcoming.

Evaluating large language models (LLMs) for tasks like fact extraction in support of knowledge graph construction frequently involves computing accuracy metrics using a ground truth benchmark based on a knowledge graph (KG). These evaluations assume that errors represent factual disagreements. However, human discourse frequently features metalinguistic disagreement, where agents differ not on facts but on the meaning of the language used to express them. Given the complexity of natural language …Read more
Evaluating large language models (LLMs) for tasks like fact extraction in support of knowledge graph construction frequently involves computing accuracy metrics using a ground truth benchmark based on a knowledge graph (KG). These evaluations assume that errors represent factual disagreements. However, human discourse frequently features metalinguistic disagreement, where agents differ not on facts but on the meaning of the language used to express them. Given the complexity of natural language processing and generation using LLMs, we ask: do metalinguistic disagreements occur between LLMs and KGs? Based on an investigation using the T-REx knowledge alignment dataset, we hypothesize that metalinguistic disagreement does in fact occur between LLMs and KGs, with potential relevance for the practice of knowledge graph engineering. We propose a benchmark for evaluating the detection of factual and metalinguistic disagreements between LLMs and KGs. An initial proof of concept of such a benchmark is available on Github.

Large Language Models
513

Carnap’s Robot Redux: LLMs, Intensional Semantics, and the Implementation Problem in Conceptual Engineering (extended abstract)

In his 1955 essay "Meaning and synonymy in natural languages", Rudolf Carnap presents a thought experiment wherein an investigator provides a hypothetical robot with a definition of a concept together with a description of an individual, and then asks the robot if the individual is in the extension of the concept. In this work, we show how to realize Carnap's Robot through knowledge probing of an large language model (LLM), and argue that this provides a useful cognitive tool for conceptual engi…Read more
In his 1955 essay "Meaning and synonymy in natural languages", Rudolf Carnap presents a thought experiment wherein an investigator provides a hypothetical robot with a definition of a concept together with a description of an individual, and then asks the robot if the individual is in the extension of the concept. In this work, we show how to realize Carnap's Robot through knowledge probing of an large language model (LLM), and argue that this provides a useful cognitive tool for conceptual engineers to compare the extension of a proposed concept definition to the extensional knowledge represented as facts in a given knowledge base, providing a possible solution to the implementation problem in conceptual engineering.

Conceptual Engineering Formal Semantics Experimental Philosophy, Misc Large Language Models
1246

Conceptual Engineering Using Large Language Models
In Vincent C. Müller, Leonard Dung, Guido Löhr & Aliya Rumana (eds.), Philosophy of Artificial Intelligence: The State of the Art, Springernature. 2026.

We describe a method, based on Jennifer Nado’s proposal for classification procedures as targets of conceptual engineering, that implements such procedures by prompting a large language model. We apply this method, using data from the Wikidata knowledge graph, to evaluate stipulative definitions related to two paradigmatic conceptual engineering projects: the International Astronomical Union’s redefinition of PLANET and Haslanger’s ameliorative analysis of WOMAN. Our results show that classifica…Read more
We describe a method, based on Jennifer Nado’s proposal for classification procedures as targets of conceptual engineering, that implements such procedures by prompting a large language model. We apply this method, using data from the Wikidata knowledge graph, to evaluate stipulative definitions related to two paradigmatic conceptual engineering projects: the International Astronomical Union’s redefinition of PLANET and Haslanger’s ameliorative analysis of WOMAN. Our results show that classification procedures built using our approach can exhibit good classification performance and, through the generation of rationales for their classifications, can contribute to the identification of issues in either the definitions or the data against which they are being evaluated. We consider objections to this method, and discuss implications of this work for three aspects of theory and practice of conceptual engineering: the definition of its targets, empirical methods for their investigation, and their practical roles. The data and code used for our experiments, together with the experimental results, are available in a Github repository..

Conceptual Engineering Philosophy of Artificial Intelligence Foundations of Experimental Philosophy

Bradley Allen

B_BL: A Bilateral Modal Logic for LLM Factuality Evaluation

Neurosymbolic Knowledge Engineering with Natural Language
Dissertation, University of Amsterdam. 2026.

Elenchus: Generating Knowledge Bases from Prover-Skeptic Dialogues

Large language models and the relative roles of formal and natural language in formalization

Carnap’s Robot Redux: LLMs, Intensional Semantics, and the Implementation Problem in Conceptual Engineering (extended abstract)

Conceptual Engineering Using Large Language Models
In Vincent C. Müller, Leonard Dung, Guido Löhr & Aliya Rumana (eds.), Philosophy of Artificial Intelligence: The State of the Art, Springernature. 2026.

Bradley Allen

B_BL: A Bilateral Modal Logic for LLM Factuality Evaluation

Neurosymbolic Knowledge Engineering with Natural Language Dissertation, University of Amsterdam. 2026.

Elenchus: Generating Knowledge Bases from Prover-Skeptic Dialogues

Large language models and the relative roles of formal and natural language in formalization

Carnap’s Robot Redux: LLMs, Intensional Semantics, and the Implementation Problem in Conceptual Engineering (extended abstract)

Conceptual Engineering Using Large Language Models In Vincent C. Müller, Leonard Dung, Guido Löhr & Aliya Rumana (eds.), Philosophy of Artificial Intelligence: The State of the Art, Springernature. 2026.

Neurosymbolic Knowledge Engineering with Natural Language
Dissertation, University of Amsterdam. 2026.

Conceptual Engineering Using Large Language Models
In Vincent C. Müller, Leonard Dung, Guido Löhr & Aliya Rumana (eds.), Philosophy of Artificial Intelligence: The State of the Art, Springernature. 2026.