While generative artificial intelligence (AI) is rapidly proliferating in healthcare research and clinical settings, there is a lack of actionable research ethics standards that reflect the unique technical features of generative AI, such as hallucination, agentic autonomy, and decontextualization. Existing AI ethics studies primarily focus on presenting universal principles, and Delphi studies report difficulties in deriving expert consensus owing to the overly broad scope of AI concepts. We ai…
Read moreWhile generative artificial intelligence (AI) is rapidly proliferating in healthcare research and clinical settings, there is a lack of actionable research ethics standards that reflect the unique technical features of generative AI, such as hallucination, agentic autonomy, and decontextualization. Existing AI ethics studies primarily focus on presenting universal principles, and Delphi studies report difficulties in deriving expert consensus owing to the overly broad scope of AI concepts. We aim to systematically identify ethical issues in generative AI research within the healthcare domain and to develop practical ethical guidelines and a checklist that researchers can use across the entire research and development lifecycle. We applied a three-stage modified Delphi method in accordance with the Conducting and REporting DElphi Studies (CREDES) guidelines. Thirty-five experts spanning the fields of medicine, law/ethics/policy, and AI technology were invited. Round 1 was conducted as an in-person workshop involving 18 experts, while Rounds 2 (n = 32) and 3 (n = 27) were conducted via online surveys rating 56 and 43 items, respectively, using a 7-point Likert scale. Consensus criteria were set at interquartile range (IQR) ≤ 1.5 and coefficient of variation (CV) < 0.5 (high consensus) for Round 2, and a stricter criterion of IQR ≤ 1.0 (strengthened consensus) for Round 3. In Round 2, 97.7% of Likert-scaled items (42/43) entered the acceptable range for guideline adoption. In Round 3, 60.5% of items reached consensus under the strengthened criterion (IQR ≤ 1.0), and 46.5% achieved strong consensus (IQR ≤ 1.0 and CV ≤ 0.2). By domain, documentation standards (mean 6.06), safety measures (mean 5.95), and evaluation methods (mean 5.86) recorded the highest importance. For individual items, explainable AI (mean 6.48), ensuring the diversity of training data (mean 6.44), and human-in-the-loop (mean 6.33) were derived as core items and top-priority strategies. Based on these findings, an ethical framework comprising three domains (data, governance, and design-by-value) and eight value dimensions, alongside a lifecycle checklist categorized into pre-development, development, and post-deployment stages, was developed. We developed a differentiated ethical framework and practical checklist that reflect the technical features of generative AI. These outputs can serve as sector-specific guidance for healthcare under the Framework Act on AI and as criteria for institutional review board reviews.