Juan Cadile (University of Rochester)

Currently a PhD student at the University of Rochester. My research intersects philosophy of mind and AI interpretability. I investigate what large language models represent about moral and psychological properties, and I examine the gap between detecting such properties in model activations and reliably steering behavior through them. I also develop empirical benchmarks for measuring virtue and wellbeing in such systems.