Sean O. O HEigeartaigh (Cambridge University): Publications

More details

Cambridge University

Researcher

Cambridge, United Kingdom of Great Britain and Northern Ireland

250

Continual Learning Requires Evaluating Trajectories
with Lorenzo Pacchiardi, Patricia Paskov, Fernando Martínez-Plumed, Katherine M. Collins, Fazl Barez, Jonathan Prunty, Matteo Gabriel Mecattaf, Zafeirios Fountas, Risto Uuk, Sanmi Koyejo, Cozmin Ududec, and José Hernández-Orallo

AI systems increasingly incorporate continual learning mechanisms allowing their behaviour to adapt after deployment, from (1) in-context learning and (2) memory features already in wide use to (3) post-deployment weight modification under research. We argue that, by treating AI systems as frozen artefacts whose performance and safety are assessed at release, current evaluation practices structurally ignore the behavioural trajectory of a system that continues to learn from experience. Our posit…Read more
AI systems increasingly incorporate continual learning mechanisms allowing their behaviour to adapt after deployment, from (1) in-context learning and (2) memory features already in wide use to (3) post-deployment weight modification under research. We argue that, by treating AI systems as frozen artefacts whose performance and safety are assessed at release, current evaluation practices structurally ignore the behavioural trajectory of a system that continues to learn from experience. Our position is that evaluation of continual learning systems should be centred on behavioural trajectories, with the complementary goals of characterising the landscape of possible behaviours and forecasting how behaviour will evolve from a given set of experiences. This can be operationalised through trajectory elicitation sandboxes and predictive monitors that forecast behavioural evolution, but may face fundamental obstacles analogous to those seen in dynamical systems. These are best addressed by (1) applying trajectory-centred evaluation to today's continual learning systems and (2) relying on the resulting evidence to design systems amenable to it, yielding a virtuous cycle in which systems and their evaluations co-evolve.

Impact of Artificial Intelligence
179

Reverse Turing Tests for Human-Machine Task Suitability Assessments Should be Profile-Driven
with Jonathan Prunty, Marko Tešić, John Burden, Ben Slater, Zachary Tidler, Paul Clothier, Luning Sun, Katherine Collins, Bernardo Gonçalves, Giulio Corsi, Lucy Cheke, and Jose Hernandez-Orallo

As AI is integrated into the workplace, organisations increasingly face allocation decisions between human and machine workers. These decisions are increasingly made or assisted by algorithms, creating a Reverse Turing Test dynamic wherein the machine is now the judge. In addition, human and machine workers may ``compete'' for a given task, reproducing aspects of adversarial games. This raises new methodological questions about assessing task suitability between humans and machines. The criteria…Read more
As AI is integrated into the workplace, organisations increasingly face allocation decisions between human and machine workers. These decisions are increasingly made or assisted by algorithms, creating a Reverse Turing Test dynamic wherein the machine is now the judge. In addition, human and machine workers may ``compete'' for a given task, reproducing aspects of adversarial games. This raises new methodological questions about assessing task suitability between humans and machines. The criteria often used to assess people (e.g., education, experience, references) cannot feasibly scale to AI systems; conversely, AI evaluation methods (benchmarks, red teaming, leaderboards) cannot be easily applied to human workers or yield comparable metrics. In this position paper, we argue that suitability evaluations for task-assignment should be profile-driven -- that is, based on assessments that infer latent constructs such as capabilities and propensities from observed performance. This approach places humans and AI systems on shared scales, supporting comparisons that are predictive of novel-task performance, explanatory of why agents succeed or fail, and auditable. We outline the core features of this approach, discuss its practical implications, and compare it with alternative frameworks for human-machine workplace allocation.
10

Predictable artificial intelligence
with Lexin Zhou, Pablo A. M. Casares, Fernando Martínez-Plumed, John Burden, Ryan Burnell, Lucy Cheke, Cèsar Ferri, Alexandru Marcoci, Behzad Mehrbakhsh, Yael Moros-Daval, Danaja Rutar, Wout Schellaert, Konstantinos Voudouris, and José Hernández-Orallo

Artificial Intelligence 353 (C): 104491. 2026.

Science, Logic, and Mathematics
41

Mapping Intelligence: Requirements and Possibilities
with Sankalp Bhatnagar, Anna Alexandrova, Shahar Avin, Stephen Cave, Lucy Cheke, Matthew Crosby, Jan Feyereisl, Marta Halina, Bao Sheng Loe, Fernando Martínez-Plumed, Huw Price, Henry Shevlin, Adrian Weller, Alan Winfield, and José Hernández-Orallo

In Vincent C. Müller (ed.), Philosophy and theory of artificial intelligence 2017, Springer Verlag. pp. 117-135. 2017.

New types of artificial intelligence (AI), from cognitive assistants to social robots, are challenging meaningful comparison with other kinds of intelligence. How can such intelligent systems be catalogued, evaluated, and contrasted, with representations and projections that offer meaningful insights? To catalyse the research in AI and the future of cognition, we present the motivation, requirements and possibilities for an atlas of intelligence: an integrated framework and collaborative open re…Read more
New types of artificial intelligence (AI), from cognitive assistants to social robots, are challenging meaningful comparison with other kinds of intelligence. How can such intelligent systems be catalogued, evaluated, and contrasted, with representations and projections that offer meaningful insights? To catalyse the research in AI and the future of cognition, we present the motivation, requirements and possibilities for an atlas of intelligence: an integrated framework and collaborative open repository for collecting and exhibiting information of all kinds of intelligence, including humans, non-human animals, AI systems, hybrids and collectives thereof. After presenting this initiative, we review related efforts and present the requirements of such a framework. We survey existing visualisations and representations, and discuss which criteria of inclusion should be used to configure an atlas of intelligence.
626

Your Prompt is my command: On Assessing the Human-Centred Generality of Multimodal Models
with Wout Schellaert, Fernando Martínez-Plumed, Karina Vold, John Burden, Pablo A. M. Casares, Bao Sheng Loe, Roi Reichart, Anna Korhonen, and José Hernández-Orallo

Journal of Artificial Intelligence Research 77. 2023.

Even with obvious deficiencies, large prompt-commanded multimodal models are proving to be flexible cognitive tools representing an unprecedented generality. But the directness, diversity, and degree of user interaction create a distinctive “human-centred generality” (HCG), rather than a fully autonomous one. HCG implies that —for a specific user— a system is only as general as it is effective for the user’s relevant tasks and their prevalent ways of prompting. A human-centred evaluation of gene…Read more
Even with obvious deficiencies, large prompt-commanded multimodal models are proving to be flexible cognitive tools representing an unprecedented generality. But the directness, diversity, and degree of user interaction create a distinctive “human-centred generality” (HCG), rather than a fully autonomous one. HCG implies that —for a specific user— a system is only as general as it is effective for the user’s relevant tasks and their prevalent ways of prompting. A human-centred evaluation of general-purpose AI systems therefore needs to reflect the personal nature of interaction, tasks and cognition. We argue that the best way to understand these systems is as highly-coupled cognitive extenders, and to analyse the bidirectional cognitive adaptations between them and humans. In this paper, we give a formulation of HCG, as well as a high-level overview of the elements and trade-offs involved in the prompting process. We end the paper by outlining some essential research questions and suggestions for improving evaluation practices, which we envision as characteristic for the evaluation of general artificial intelligence in the future.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Evolution of the contours of AI (edited book)
with Fernando Martínez-Plumed, Bao Sheng Loe, Peter Flach, Karina Vold, and José Hernández-Orallo

. 2018.
74

The Facets of Artificial Intelligence: A Framework to Track the Evolution of AI
with Fernando Martínez-Plumed, Bao Sheng Loe, Peter Flach, Karina Vold, and José Hernández-Orallo

In Fernando Martínez-Plumed, Bao Sheng Loe, Peter Flach, Sean O. O. HEigeartaigh, Karina Vold & José Hernández-Orallo (eds.), Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Evolution of the contours of AI, . pp. 5180-5187. 2018.

We present nine facets for the analysis of the past and future evolution of AI. Each facet has also a set of edges that can summarise different trends and contours in AI. With them, we first conduct a quantitative analysis using the information from two decades of AAAI/IJCAI conferences and around 50 years of documents from AI topics, an official database from the AAAI, illustrated by several plots. We then perform a qualitative analysis using the facets and edges, locating AI systems in the int…Read more
We present nine facets for the analysis of the past and future evolution of AI. Each facet has also a set of edges that can summarise different trends and contours in AI. With them, we first conduct a quantitative analysis using the information from two decades of AAAI/IJCAI conferences and around 50 years of documents from AI topics, an official database from the AAAI, illustrated by several plots. We then perform a qualitative analysis using the facets and edges, locating AI systems in the intelligence landscape and the discipline as a whole. This analytical framework provides a more structured and systematic way of looking at the shape and boundaries of AI.

Philosophy of AI, General Works Artificial Intelligence Methodology Machine Learning Areas of Artificia…Read more
Philosophy of AI, General Works Artificial Intelligence Methodology Machine Learning Areas of Artificial Intelligence, Misc

Sean O. O HEigeartaigh

Continual Learning Requires Evaluating Trajectories with Lorenzo Pacchiardi, Patricia Paskov, Fernando Martínez-Plumed, Katherine M. Collins, Fazl Barez, Jonathan Prunty, Matteo Gabriel Mecattaf, Zafeirios Fountas, Risto Uuk, Sanmi Koyejo, Cozmin Ududec, and José Hernández-Orallo

Reverse Turing Tests for Human-Machine Task Suitability Assessments Should be Profile-Driven with Jonathan Prunty, Marko Tešić, John Burden, Ben Slater, Zachary Tidler, Paul Clothier, Luning Sun, Katherine Collins, Bernardo Gonçalves, Giulio Corsi, Lucy Cheke, and Jose Hernandez-Orallo

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Evolution of the contours of AI (edited book) with Fernando Martínez-Plumed, Bao Sheng Loe, Peter Flach, Karina Vold, and José Hernández-Orallo . 2018.

Continual Learning Requires Evaluating Trajectories
with Lorenzo Pacchiardi, Patricia Paskov, Fernando Martínez-Plumed, Katherine M. Collins, Fazl Barez, Jonathan Prunty, Matteo Gabriel Mecattaf, Zafeirios Fountas, Risto Uuk, Sanmi Koyejo, Cozmin Ududec, and José Hernández-Orallo

Reverse Turing Tests for Human-Machine Task Suitability Assessments Should be Profile-Driven
with Jonathan Prunty, Marko Tešić, John Burden, Ben Slater, Zachary Tidler, Paul Clothier, Luning Sun, Katherine Collins, Bernardo Gonçalves, Giulio Corsi, Lucy Cheke, and Jose Hernandez-Orallo

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Evolution of the contours of AI (edited book)
with Fernando Martínez-Plumed, Bao Sheng Loe, Peter Flach, Karina Vold, and José Hernández-Orallo

. 2018.