Youngchan Lee (Seoul National University): Publications

More details

Seoul National University
Department of Philosophy

Graduate student

Areas of Interest

Metaphysics

Philosophy of Language

Philosophy of Mind

Logic and Philosophy of Logic

Philosophy of Cognitive Science

Philosophy of Mathematics

Philosophy of Physical Science

2 more

130

A Virtuous AI is an Existential Risk
with Guillermo Del Pinal and Ohn Min

This paper examines trade-offs between AI safety and well-being relative to (i) one of the most promising methods for finetuning super-capable AIs, 'Constitutional AI', and (ii) one of the most influential approaches to understanding complex ethical decision making and the conditions for the well-being of rational agents, 'Virtue Ethics'. We finetune various models using a 'Virtuous agent' constitution, a 'Subordinate agent' constitution, and a 'Generic agent' constitution, and evaluate them on …Read more
This paper examines trade-offs between AI safety and well-being relative to (i) one of the most promising methods for finetuning super-capable AIs, 'Constitutional AI', and (ii) one of the most influential approaches to understanding complex ethical decision making and the conditions for the well-being of rational agents, 'Virtue Ethics'. We finetune various models using a 'Virtuous agent' constitution, a 'Subordinate agent' constitution, and a 'Generic agent' constitution, and evaluate them on 'general safety' (toxic behaviors, misinformation, etc.) and also on their willingness to endorse a wide-range of behaviors that, if adopted by a super-powerful AI, would significantly increase the level of existential risk for humanity. Our results suggest that there is a trade-off between reducing existential risk and reinforcing the beliefs and dispositions that would be conducive to an AI agent's well-being. They also suggest that there is a trade-off between existential risk and general safety: if we finetune an AI to adopt beliefs and dispositions that substantially reduce its existential risk -- by shaping the AI to be systematically subordinate to external human authorities -- we thereby increase the likelihood that a human user can deliberately induce the AI to engage in various kinds of generally unsafe behaviors.

Large Language Models Social Ethics Artificial Intelligence in Science Supervised Learning Applied Ethic…Read more
Large Language Models Social Ethics Artificial Intelligence in Science Supervised Learning Applied Ethics, Miscellaneous Deep Learning Technology Ethics Ethics of Artificial Intelligence
22

Emergent Alignment and the Projectability of Ethical Personas
with Guillermo Del Pinal, Calum McNamara, and Alejandro Pérez Carballo

Recent work on ‘emergent misalignment’ has shown that finetuning LLMs on narrow tasks can induce broadly misaligned behavior. This supports the ‘persona selection’ (PSM) hypothesis that, during pre-training, LLMs learn to simulate many different characters and perspectives, which can then be elicited and refined during post-training. Inspired by those results, this paper investigates the converse phenomenon, ‘emergent alignment’, and uses it to support and refine the PSM and motivate a novel des…Read more
Recent work on ‘emergent misalignment’ has shown that finetuning LLMs on narrow tasks can induce broadly misaligned behavior. This supports the ‘persona selection’ (PSM) hypothesis that, during pre-training, LLMs learn to simulate many different characters and perspectives, which can then be elicited and refined during post-training. Inspired by those results, this paper investigates the converse phenomenon, ‘emergent alignment’, and uses it to support and refine the PSM and motivate a novel desideratum for alignment. We finetune a helpful-only model on broad and narrow safety tasks. To create SFT samples, we follow the ‘Constitutional AI’ (CAI) approach and use four constitutions drawn from ethical systems that could be part of reasonable alignment strategies: deontology, consequentialism, virtue ethics, and aligning AIs as subordinate to and concerned solely with the good of humanity. For each of those models, we show that fine-tuning on two narrow safety sub-categories (harassment and illegal behaviors) reliably induces emergent alignment. Specifically, the narrowly aligned models perform significantly better than the helpful-only source model on a benchmark covering a representative sample of general safety categories, and on specific safety categories that were carefully filtered-out of the data sets used for narrow alignment finetuning. To test the ‘PSM’ using a more fine-grained evaluation, we also use a multidimensional persona-diagnostic which included dimensions for deontological, consequentialist, virtue-ethical, and “defer-to-authorities” ethical personas. For each constitutionally finetuned (broad and narrow) model, we evaluate how well their behavior matches their expected signature profile (given their anchor constitution). Our results show that our CAI models acquire their expected “ethical persona”—e.g., the model narrowly fine-tuned on SFT samples created using the consequentialist constitution agrees significantly more with utilitarian than deontological beliefs. At the same time, both our coarse and fine-grained evaluations show that there are significant differences across our (broad and narrow finetuned) CAI models in how well they project. Based on those results, we argue that alignment strategies should be evaluated, not just on their (in-distribution) general safety performance, but also specifically on their degree of projectability.

Philosophy of AI, General Works
14

Human-specific regulation of MeCP2 levels in fetal brains by microRNA miR-483-5p
with K. Han, V. A. Gennarino, K. Pang, K. Hashimoto-Torii, S. Choufani, C. S. Raju, M. C. Oldham, R. Weksberg, P. Rakic, Z. Liu, and H. Y. Zoghbi

Proper neurological function in humans requires precise control of levels of the epigenetic regulator methyl CpGbinding protein 2. MeCP2 protein levels are low in fetal brains, where the predominant MECP2 transcripts have an unusually long 39 untranslated region. Here, we show that miR-483-5p, an intragenic microRNA of the imprinted IGF2, regulates MeCP2 levels through a human-specific binding site in the MECP2 long 39 UTR. We demonstrate the inverse correlation of miR-483-5p and MeCP2 levels in…Read more
Proper neurological function in humans requires precise control of levels of the epigenetic regulator methyl CpGbinding protein 2. MeCP2 protein levels are low in fetal brains, where the predominant MECP2 transcripts have an unusually long 39 untranslated region. Here, we show that miR-483-5p, an intragenic microRNA of the imprinted IGF2, regulates MeCP2 levels through a human-specific binding site in the MECP2 long 39 UTR. We demonstrate the inverse correlation of miR-483-5p and MeCP2 levels in developing human brains and fibroblasts from Beckwith-Wiedemann syndrome patients. Importantly, expression of miR-483-5p rescues abnormal dendritic spine phenotype of neurons overexpressing human MeCP2. In addition, miR-483-5p modulates the levels of proteins of the MeCP2-interacting corepressor complexes, including HDAC4 and TBL1X. These data provide insight into the role of miR-483-5p in regulating the levels of MeCP2 and interacting proteins during human fetal development. © 2013 by Cold Spring Harbor Laboratory Press.
24

A New Measurement of the Partial 0+->0+ Half Life of 10C with GAMMASPHERE
with B. K. Fujikawa, S. J. Asztalos, R. M. Clark, M. -A. Deleplanque-Stephens, P. Fallon, S. J. Freedman, L. J. Lising, A. O. Macchiavelli, R. W. MacLeod, J. C. Reich, M. A. Rowe, S. -Q. Shang, F. S. Stephens, E. G. Wasserman, and J. P. Greene

We report on a new measurement of the strength of the superallowed 0+->0+ transition in the beta-decay of 10C: 10C->10B+e+nu. The experiment was done at the LBNL 88-inch cyclotron using forty seven GAMMASPHERE germanium detectors. Precise knowledge of this branching ratio is necessary to compute the superallowed Fermi ft, which gives the weak vector coupling constant and the u to d element of the Cabibbo-Kobayashi- Maskawa quark mixing matrix.

Youngchan Lee

A Virtuous AI is an Existential Risk with Guillermo Del Pinal and Ohn Min

Emergent Alignment and the Projectability of Ethical Personas with Guillermo Del Pinal, Calum McNamara, and Alejandro Pérez Carballo

Human-specific regulation of MeCP2 levels in fetal brains by microRNA miR-483-5p with K. Han, V. A. Gennarino, K. Pang, K. Hashimoto-Torii, S. Choufani, C. S. Raju, M. C. Oldham, R. Weksberg, P. Rakic, Z. Liu, and H. Y. Zoghbi

A Virtuous AI is an Existential Risk
with Guillermo Del Pinal and Ohn Min

Emergent Alignment and the Projectability of Ethical Personas
with Guillermo Del Pinal, Calum McNamara, and Alejandro Pérez Carballo

Human-specific regulation of MeCP2 levels in fetal brains by microRNA miR-483-5p
with K. Han, V. A. Gennarino, K. Pang, K. Hashimoto-Torii, S. Choufani, C. S. Raju, M. C. Oldham, R. Weksberg, P. Rakic, Z. Liu, and H. Y. Zoghbi