Jonathan Prunty (Cambridge University): Publications

More details

Cambridge University

Post-doctoral Fellow

Cambridge, United Kingdom of Great Britain and Northern Ireland

Areas of Specialization

Philosophy of Artificial Intelligence

Philosophy of Psychology

Areas of Interest

Philosophy of Artificial Intelligence

Philosophy of Psychology

329

Continual Learning Requires Evaluating Trajectories
with Lorenzo Pacchiardi, Patricia Paskov, Seán Ó hÉigeartaigh, Fernando Martínez-Plumed, Katherine M. Collins, Fazl Barez, Matteo Gabriel Mecattaf, Zafeirios Fountas, Risto Uuk, Sanmi Koyejo, Cozmin Ududec, and José Hernández-Orallo

AI systems increasingly incorporate continual learning mechanisms allowing their behaviour to adapt after deployment, from (1) in-context learning and (2) memory features already in wide use to (3) post-deployment weight modification under research. We argue that, by treating AI systems as frozen artefacts whose performance and safety are assessed at release, current evaluation practices structurally ignore the behavioural trajectory of a system that continues to learn from experience. Our posit…Read more
AI systems increasingly incorporate continual learning mechanisms allowing their behaviour to adapt after deployment, from (1) in-context learning and (2) memory features already in wide use to (3) post-deployment weight modification under research. We argue that, by treating AI systems as frozen artefacts whose performance and safety are assessed at release, current evaluation practices structurally ignore the behavioural trajectory of a system that continues to learn from experience. Our position is that evaluation of continual learning systems should be centred on behavioural trajectories, with the complementary goals of characterising the landscape of possible behaviours and forecasting how behaviour will evolve from a given set of experiences. This can be operationalised through trajectory elicitation sandboxes and predictive monitors that forecast behavioural evolution, but may face fundamental obstacles analogous to those seen in dynamical systems. These are best addressed by (1) applying trajectory-centred evaluation to today's continual learning systems and (2) relying on the resulting evidence to design systems amenable to it, yielding a virtuous cycle in which systems and their evaluations co-evolve.

Impact of Artificial Intelligence
40

A cognitive template for human face detection
with Rob Jenkins, Rana Qarooni, and Markus Bindemann

Cognition 249 (C): 105792. 2024.

Cognitive Sciences
233

Reverse Turing Tests for Human-Machine Task Suitability Assessments Should be Profile-Driven
with Marko Tešić, John Burden, Ben Slater, Zachary Tidler, Paul Clothier, Luning Sun, Katherine Collins, Bernardo Gonçalves, Giulio Corsi, Seán Ó hÉigeartaigh, Lucy Cheke, and Jose Hernandez-Orallo

As AI is integrated into the workplace, organisations increasingly face allocation decisions between human and machine workers. These decisions are increasingly made or assisted by algorithms, creating a Reverse Turing Test dynamic wherein the machine is now the judge. In addition, human and machine workers may ``compete'' for a given task, reproducing aspects of adversarial games. This raises new methodological questions about assessing task suitability between humans and machines. The criteria…Read more
As AI is integrated into the workplace, organisations increasingly face allocation decisions between human and machine workers. These decisions are increasingly made or assisted by algorithms, creating a Reverse Turing Test dynamic wherein the machine is now the judge. In addition, human and machine workers may ``compete'' for a given task, reproducing aspects of adversarial games. This raises new methodological questions about assessing task suitability between humans and machines. The criteria often used to assess people (e.g., education, experience, references) cannot feasibly scale to AI systems; conversely, AI evaluation methods (benchmarks, red teaming, leaderboards) cannot be easily applied to human workers or yield comparable metrics. In this position paper, we argue that suitability evaluations for task-assignment should be profile-driven -- that is, based on assessments that infer latent constructs such as capabilities and propensities from observed performance. This approach places humans and AI systems on shared scales, supporting comparisons that are predictive of novel-task performance, explanatory of why agents succeed or fail, and auditable. We outline the core features of this approach, discuss its practical implications, and compare it with alternative frameworks for human-machine workplace allocation.
45

Capacity limits in face detection
with Rana Qarooni, Markus Bindemann, and Rob Jenkins

Cognition 228 (C): 105227. 2022.

Cognitive Sciences

Jonathan Prunty

Continual Learning Requires Evaluating Trajectories
with Lorenzo Pacchiardi, Patricia Paskov, Seán Ó hÉigeartaigh, Fernando Martínez-Plumed, Katherine M. Collins, Fazl Barez, Matteo Gabriel Mecattaf, Zafeirios Fountas, Risto Uuk, Sanmi Koyejo, Cozmin Ududec, and José Hernández-Orallo

A cognitive template for human face detection
with Rob Jenkins, Rana Qarooni, and Markus Bindemann

Cognition 249 (C): 105792. 2024.

Reverse Turing Tests for Human-Machine Task Suitability Assessments Should be Profile-Driven
with Marko Tešić, John Burden, Ben Slater, Zachary Tidler, Paul Clothier, Luning Sun, Katherine Collins, Bernardo Gonçalves, Giulio Corsi, Seán Ó hÉigeartaigh, Lucy Cheke, and Jose Hernandez-Orallo

Capacity limits in face detection
with Rana Qarooni, Markus Bindemann, and Rob Jenkins

Cognition 228 (C): 105227. 2022.

Jonathan Prunty

Continual Learning Requires Evaluating Trajectories with Lorenzo Pacchiardi, Patricia Paskov, Seán Ó hÉigeartaigh, Fernando Martínez-Plumed, Katherine M. Collins, Fazl Barez, Matteo Gabriel Mecattaf, Zafeirios Fountas, Risto Uuk, Sanmi Koyejo, Cozmin Ududec, and José Hernández-Orallo

A cognitive template for human face detection with Rob Jenkins, Rana Qarooni, and Markus Bindemann Cognition 249 (C): 105792. 2024.

Reverse Turing Tests for Human-Machine Task Suitability Assessments Should be Profile-Driven with Marko Tešić, John Burden, Ben Slater, Zachary Tidler, Paul Clothier, Luning Sun, Katherine Collins, Bernardo Gonçalves, Giulio Corsi, Seán Ó hÉigeartaigh, Lucy Cheke, and Jose Hernandez-Orallo

Capacity limits in face detection with Rana Qarooni, Markus Bindemann, and Rob Jenkins Cognition 228 (C): 105227. 2022.

Continual Learning Requires Evaluating Trajectories
with Lorenzo Pacchiardi, Patricia Paskov, Seán Ó hÉigeartaigh, Fernando Martínez-Plumed, Katherine M. Collins, Fazl Barez, Matteo Gabriel Mecattaf, Zafeirios Fountas, Risto Uuk, Sanmi Koyejo, Cozmin Ududec, and José Hernández-Orallo

A cognitive template for human face detection
with Rob Jenkins, Rana Qarooni, and Markus Bindemann

Cognition 249 (C): 105792. 2024.

Reverse Turing Tests for Human-Machine Task Suitability Assessments Should be Profile-Driven
with Marko Tešić, John Burden, Ben Slater, Zachary Tidler, Paul Clothier, Luning Sun, Katherine Collins, Bernardo Gonçalves, Giulio Corsi, Seán Ó hÉigeartaigh, Lucy Cheke, and Jose Hernandez-Orallo

Capacity limits in face detection
with Rana Qarooni, Markus Bindemann, and Rob Jenkins

Cognition 228 (C): 105227. 2022.