Artificial Intelligence for Computer Vision in Surgery: A Call for Developing Reporting Guidelines

Advances in computing power and the availability of digital data have led to significant progress in artificial intelligence (AI) algorithms. As a result, novel and innovative applications of AI in healthcare continue to surface both in the scientific community and the lay press at a rapid pace. AI is the field of computer science that focuses on the development of algorithms that enable high-level and rational response, interaction, and advanced cognitive and perceptual functions by machines. One area of AI that has particularly bourgeoned over the last decade is computer vision (CV)— an interdisciplinary scientific field that deals with how computers can gain a high-level understanding of digital images or videos and the ability to perform functions, such as object identification and tracking and scene recognition1. Various fields in medicine have had significant success in the development of AI models capable of performing a variety of diagnostic functions using CV (e.g., identifying abnormalities in diagnostic radiology, identifying malignant skin lesions, and interpreting electrocardiograms), and there is potential for similar success in procedural specialties such as surgery. Clinicians and innovators alike have sought to develop AI algorithms capable of improving our ability to provide therapeutic interventions, such as with real-time decision-support and computer-assisted surgery. 计算能力的进步和数字数据的可用性导致了人工智能(AI)算法的重大进展。因此,人工智能在医疗保健领域的新颖和创新的应用继续以很快的速度出现在科学界和非专业媒体上。人工智能是计算机科学的一个领域,其重点是开发算法,使机器能够做出高水平的理性反应、互动以及高级认知和感知功能。在过去十年里,人工智能的一个领域特别蓬勃发展,那就是计算机视觉(CV)–这是一个跨学科的科学领域,涉及到计算机如何获得对数字图像或视频的高层次理解,以及执行功能的能力,如物体识别和跟踪以及场景识别1。医学的各个领域在开发能够使用CV执行各种诊断功能的人工智能模型方面取得了重大成功(例如,在诊断放射学中识别异常,识别恶性皮肤病变,以及解释心电图),并且有可能在外科等程序性专业领域取得类似的成功。临床医生和创新者都在寻求开发能够提高我们提供治疗性干预能力的人工智能算法,如实时决策支持和计算机辅助手术。

The number of scientific publications involving AI has increased steadily over the past decade, and many AI algorithms for medical applications have been approved for use by the Food and Drug Administration (FDA)2. However, despite early successes with this new technology, there are concerns regarding the most-appropriate methodology for the design, development, and validation of AI algorithms. Furthermore, the existing literature suffers from a methodological “black box” caused by incomplete reporting2. Therefore, more transparency and interpretability of AI-based clinical research in medicine are necessary. The Consolidated Standard of Reporting (CONSORT) and Standard Protocol Items: Recommendations and Intervention Trials (SPIRIT) guidelines have been extended to AI studies through CONSORT-AI3 and SPIRIT-AI4. The Standards for Reporting of Diagnostic Accuracy Studies (STARD) and Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) will also be extended to STARD-AI5 and TRIPOD-ML6. In addition, the minimal information about clinical artificial intelligence modeling (MI-CLAIM)7 and minimum information for medical AI reporting (MINIMAR)8 have been published as minimum reporting guidelines that aim to standardize medical AI research in terms of transparency and utility. Since the majority of AI interventions in medicine involve the field of computer-assisted diagnosis (CAD), the existing reporting guidelines have focused on studies related to CAD, such as diagnostic accuracy, prediction models, clinical decision support, and implementations in clinical trials (Table 1). There has not been much attention paid to surgical applications of AI algorithms that could assist in decision-making based on CV analysis of operative performance. 在过去的十年里,涉及人工智能的科学出版物数量稳步增长,许多用于医疗的人工智能算法已经被美国食品和药物管理局(FDA)批准使用2。然而,尽管这项新技术在早期取得了成功,但人们对设计、开发和验证人工智能算法的最合适的方法仍有顾虑。此外,现有的文献存在着由不完整的报告引起的方法学 “黑箱 “2。因此,基于人工智能的医学临床研究需要更多的透明度和可解释性。综合报告标准(CONSORT)和标准协议项目:建议和干预试验(SPIRIT)指南已通过CONSORT-AI3和SPIRIT-AI4扩展到AI研究。诊断准确性研究报告标准(STARD)和个人预后或诊断的多变量预测模型的透明报告(TRIPOD)也将扩展到STARD-AI5和TRIPOD-ML6。此外,关于临床人工智能建模的最低信息(MI-CLAIM)7和医学人工智能报告的最低信息(MINIMAR)8已经作为最低报告指南发布,旨在从透明度和效用方面规范医学人工智能研究。由于医学中的大多数人工智能干预涉及计算机辅助诊断(CAD)领域,现有的报告指南集中在与CAD相关的研究,如诊断准确性、预测模型、临床决策支持和临床试验中的实施(表1)。对人工智能算法的外科应用还没有太多的关注,这些算法可以根据手术表现的CV分析来协助决策。

Table 1. Reporting guidelines for studies involving artificial intelligence and machine learning

RCT: randomized controlled trial; AI: artificial intelligence; CV: computer vision

Recent advances in AI-based approaches to CV (e.g., convolutional deep neural networks) has led to the development of several AI algorithms that can analyze and make interpretations within the operative field9-11. Over 300 publications related to CV in surgery have been published—most of them in the last few years. While the aforementioned reporting guidelines such as CONSORT-AI and SPIRIT-AI have unequivocal roles in promoting high-quality reporting of data for AI research in medicine, there are nuances to research in CV in surgery that require more specialized guidelines. Given that AI-based CV in surgery is a relatively new field, methodological standards are lacking in the scientific and surgical communities. The lack of reporting guidelines specific to research and innovation in this field is a major obstacle for the production of scientific work that is interpretable, reproducible, and scalable. Researchers may struggle with reporting of their methodology, data collection, training, and testing of AI algorithms. Similarly, journal editors and peer-reviewers may have difficulty in critically appraising manuscripts to determine if the findings can be generalized or interpreted by their readership (mostly surgeons who lack a technical background in this field). 最近,基于人工智能的CV方法(如卷积深度神经网络)的进展,导致了一些人工智能算法的发展,可以在手术领域内进行分析和解释9-11。与外科手术中的CV相关的出版物已超过300篇–其中大部分是在过去几年里发表的。虽然上述报告指南,如CONSORT-AI和SPIRIT-AI,在促进人工智能医学研究数据的高质量报告方面有明确的作用,但外科中的CV研究存在细微差别,需要更专业的指南。鉴于基于人工智能的外科CV是一个相对较新的领域,科学界和外科界都缺乏方法学标准。缺乏专门针对这一领域的研究和创新的报告准则,是产生可解释、可复制和可扩展的科学工作的主要障碍。研究人员可能会在报告他们的方法论、数据收集、训练和测试人工智能算法的过程中挣扎。同样,期刊编辑和同行评审员可能难以批判性地评估手稿,以确定其读者群(大多是缺乏该领域技术背景的外科医生)是否可以归纳或解释这些发现。

Due to the innate multidisciplinary nature of research and innovation in AI-based CV, collaboration among clinicians, engineers, and data scientists is crucial, and such guidelines need the input of all stakeholders. Studies involving AI-based CV in surgery have several issues that need to be addressed by these stakeholders. Chief amongst them are the technical and non-technical characteristics of the surgical videos used in training and testing datasets (e.g., number and characteristics of patients, surgeons, and institutions from which the data are procured) and the real-time performance characteristics of the model (e.g., inference speed and computational requirements). Moreover, details need to be specified with regard to data annotation (e.g., definitions of the clinical phenomena being annotated, the number and clinical experience of annotators, and inter-annotator reliability12,13). 由于基于人工智能的CV的研究和创新具有先天的多学科性质,临床医生、工程师和数据科学家之间的合作是至关重要的,而且这种指南需要所有利益相关者的投入。涉及基于人工智能的外科手术CV的研究有几个问题需要这些利益相关者来解决。其中最主要的是用于训练和测试数据集的手术视频的技术和非技术特征(例如,病人、外科医生和采购数据的机构的数量和特征)以及模型的实时性能特征(例如,推理速度和计算要求)。此外,还需要明确数据注释的细节(例如,被注释的临床现象的定义,注释者的数量和临床经验,以及注释者之间的可靠性12,13)。

Most AI-based CV models require content expertise for data annotation and training of AI algorithms designed to perform specialized functions. Therefore, quality assurance measures for data annotation need to be established a priori to ensure model integrity. Furthermore, standardized reporting criteria for annotation procedures for surgical videos are necessary to enable transparent reporting and appropriate interpretation. Other important considerations include data privacy as well as the ethical and responsible utilization of this technology for patient care. Much of the existing literature has been on model development and performance; however, it is imperative that ongoing research efforts in AI-based CV also be generalizable to a diverse group of populations and adhere to ethically-sound guidelines. As future infrastructure for intraoperative video data collection and sharing between institutions continue to be developed, these principles need to be clarified and incorporated as best practice guidelines.  大多数基于人工智能的CV模型需要数据注释的内容专业知识,以及为执行专门功能而设计的人工智能算法的培训。因此,需要事先建立数据注释的质量保证措施,以确保模型的完整性。此外,有必要为手术视频的注释程序制定标准化的报告标准,以实现透明的报告和适当的解释。其他重要的考虑因素包括数据隐私,以及在病人护理方面对这项技术的道德和负责任的利用。现有的大部分文献都是关于模型开发和性能的;然而,当务之急是正在进行的基于人工智能的CV的研究工作也可以推广到不同的人群,并遵守道德上合理的准则。随着未来术中视频数据收集和机构间共享的基础设施的不断发展,这些原则需要被澄清并纳入最佳实践指南。

To address this important gap in surgical research, the Computer Vision in Surgery International Collaborative is under development. This collaboration will be composed of a multidisciplinary group of experts and stakeholders whose mission is to develop guidelines for the reporting of research and innovation specific to AI-based CV in surgery. The central objective is to devise a standardized and minimum set of requirements for reporting methodology and results in the publication of scientific work on CV in surgery. Given the unique nature of surgical videos, these reporting guidelines will focus on video and image analysis of surgical procedures using AI algorithms for performing CV tasks. Just as the quality of randomized controlled trials has greatly improved and contributed to the construction of robust evidence for medical practice since the first version of CONSORT was developed in 1996, these guidelines should help promote reliability, transparency, and completeness of published works, and improve the readability and interpretability by the readership. Ultimately, we hope it will contribute to the development of the field itself. 为了解决外科研究中的这一重要空白,外科计算机视觉国际合作组织正在发展之中。该合作组织将由一个由专家和利益相关者组成的多学科小组组成,其任务是为基于人工智能的外科CV的研究和创新的报告制定指导方针。核心目标是为报告外科手术中的CV的科学工作的方法和结果制定一套标准化的最低要求。鉴于手术视频的独特性质,这些报告指南将重点关注使用人工智能算法执行CV任务的外科手术视频和图像分析。正如自1996年制定第一版CONSORT以来,随机对照试验的质量有了很大的提高,并有助于构建强有力的医学实践证据一样,这些指南应有助于促进已发表作品的可靠性、透明度和完整性,并提高读者群的可读性和可解释性。最终,我们希望它能为该领域本身的发展作出贡献。

While the intended guideline uniquely covers CV research in the field of surgery, this would not be limited only to a specific clinical study design or phase. Rather, it would work in tandem with other guidelines under development. The guidelines of SPIRIT-AI and CONSORT-AI are developed for clinical trials with AI interventions. The SPIRIT-AI is complementary to the CONSORT-AI statement, which aims to promote promoting transparency and completeness for clinical trials protocols for AI trials. STARD-AI and TRIPOD-ML are sets of reporting standards for diagnostic accuracy studies and prediction model studies using AI, respectively, and cover the phase of development and technical validation in silico. However, there are specific issues in the research field of AI-based CV in surgery, and they are universal issues regardless of study design and phase. Therefore, it is expected that these guidelines can be used in an over-lapping manner with existing guidelines, and rather may provide value alongside other, more generic guidelines.    虽然打算制定的指南独特地涵盖了外科领域的CV研究,但这不会只局限于特定的临床研究设计或阶段。相反,它将与其他正在制定的指南协同工作。SPIRIT-AI和CONSORT-AI的指南是为人工智能干预的临床试验制定的。SPIRIT-AI是对CONSORT-AI声明的补充,其目的是促进推动AI试验的临床试验方案的透明度和完整性。STARD-AI和TRIPOD-ML分别是使用人工智能的诊断准确性研究和预测模型研究的一套报告标准,涵盖了开发和技术验证的硅基阶段。然而,在基于人工智能的外科CV的研究领域,存在一些特殊的问题,而且无论研究设计和阶段如何,这些问题都是普遍存在的。因此,预计这些指南可以与现有的指南重叠使用,而是可能与其他更通用的指南一起提供价值。

Through a preliminary scoping review to identify candidate items to include within reporting guidelines, the following four themes were identified: (1) study context (study design, study phase, and surgical procedure details); (2) dataset and annotation (dataset details, cohort characteristics, and annotation details); (3) model, evaluation, and validation (computer vision task, model optimization, computer specification, evaluation, and validation); (4) ethical and regulatory processes (institutional review board approval, informed consent, video dataset availability, anonymization, and code availability). 通过初步的范围审查,以确定报告指南中的候选项目,确定了以下四个主题。(1)研究背景(研究设计、研究阶段和外科手术细节);(2)数据集和注释(数据集细节、队列特征和注释细节);(3)模型、评价和验证(计算机视觉任务、模型优化、计算机规范、评价和验证);(4)伦理和监管过程(机构审查委员会批准、知情同意、视频数据集可用性、匿名化和代码可用性)。

The list of reporting items will be drafted using a systematic mixed-method approach. Focus groups for each theme will be formed, and qualitative data from these discussions will be synthesized into a comprehensive list of items. Finally, consensus will be established using a modified Delphi methodology to draft a finalized list of reporting guidelines. The consensus-building process will be developed in close collaboration with key stakeholders, including surgeons, computer scientists, AI engineers, journal editors, bioethicists, legal experts, global health experts, and patient advocates. To ensure a diversity of opinions, the multinational group will be composed of members from a wide breadth of demographics with representation from most continents (with the exception of Antarctica).     报告项目的清单将采用系统的混合方法起草。将为每个主题成立焦点小组,从这些讨论中获得的定性数据将被综合为一个全面的项目清单。最后,将采用修改后的德尔菲方法建立共识,起草一份最终的报告准则清单。建立共识的过程将与关键的利益相关者密切合作,包括外科医生、计算机科学家、人工智能工程师、期刊编辑、生物伦理学家、法律专家、全球健康专家和患者代言人。为了确保意见的多样性,这个多国小组将由来自广泛的人口统计学成员组成,并有来自大多数大陆(南极洲除外)的代表。

AI-based research in healthcare has grown exponentially, and its application in CV and surgical procedures is gaining significant momentum. The growing popularity of minimally invasive surgery (e.g., laparoscopy and robotic surgery) as well as the increase in the storage capacity and transfer of intraoperative data have brought us closer to a new age in digital surgery that requires rigorous surgical data science to ensure high quality evidence for its adoption. The lack of standards for the reporting of studies in AI-based CV research for surgery may slow the development, evaluation, and adoption of these technologies and may limit hopes of using such technologies to enable the realization of image-guided surgery, intraoperative decision support systems, and autonomous surgical platforms. As this field of research continues to grow, we hope that the Computer Vision in Surgery International Collaborative can help establish best practices to guide future work and ensure that this technology is developed and implemented in a scientifically-sound, responsible, and ethical manner for the benefit of patients and the global surgical community. 以人工智能为基础的医疗保健研究已呈指数级增长,其在CV和外科手术中的应用正获得巨大的发展势头。微创手术(如腹腔镜和机器人手术)的日益普及,以及术中数据的存储容量和传输的增加,使我们更接近数字手术的新时代,这需要严格的手术数据科学来确保其采用的高质量证据。在基于人工智能的手术CV研究中,由于缺乏研究报告的标准,可能会减缓这些技术的发展、评估和采用,并可能限制利用这些技术实现图像引导手术、术中决策支持系统和自主手术平台的希望。随着这一研究领域的不断发展,我们希望外科手术中的计算机视觉国际合作组织能够帮助建立最佳实践,以指导未来的工作,并确保这一技术以科学合理、负责任和道德的方式发展和实施,以造福患者和全球外科界。

中文为 DeepL 工具翻译