Currently a PhD student at the University of Rochester. My research intersects philosophy of mind and AI interpretability. I investigate what large language models represent about moral and psychological properties, and I examine the gap between detecting such properties in model activations and reliably steering behavior through them. I also develop empirical benchmarks for measuring virtue and wellbeing in such systems.
APA Eastern Division
Rochester, New York, United States of America
Areas of Specialization
| Philosophy of Mind |
| Philosophy of Artificial Intelligence |
| Philosophy of Technology |