Title
题目
Subtyping breast lesions via collective intelligence based long-tailedrecognition in ultrasound
基于集体智能长尾识别的超声乳腺病变亚型分类
01
文献速递介绍
乳腺癌已被报道为全球最常见的人类肿瘤,发病率高达11.7%(占新增病例总数),严重影响无数女性的健康(Sung等人,2021)。早期诊断和有效治疗程序对改善乳腺肿瘤预后至关重要(Barrios,2022)。超声(US)因其成本效益高、实时成像且无电离辐射等优势,常被用于评估乳腺异常,尤其适用于乳腺致密的年轻患者(Cheng等人,2010;Dunne等人,2017)。在标准检查中,检测到的病变通常通过特征描述来预测良恶性可能性。然而,基于组织学特征存在多种病变亚型,如纤维腺瘤(FAD)、纤维囊性乳腺病(FCBD)、导管内乳头状瘤(IP)、硬化性腺病(SA)、乳腺炎(MT)、良性叶状肿瘤(BPT)、浸润性癌(IC)和导管原位癌(DCIS)(见图1)。这些亚型表现出广泛的形态学、超声特征和生物学特性,导致对不同治疗方式的反应模式和临床结局存在差异(Makki,2015)。例如,某些药物(如雷洛昔芬)已被批准用于治疗浸润性癌,而对导管原位癌无疗效(Sun等人,2017;Fisher等人,1993)。此外,临床指南还建议对浸润性癌患者考虑额外的前哨淋巴结活检(Network等人,2015)。部分研究还发现,某些良性病变(如硬化性腺病和导管内乳头状瘤)相比其他亚型具有更高的后续乳腺癌风险(Guray和Sahin,2006)。因此,更早、更准确地识别这些病变亚型可助力更合适、及时的干预措施和个体化治疗策略(Yersal和Barutca,2014)。 然而,超声图像可能存在对比度低、斑点噪声及周围组织干扰等问题。识别和解读不同病变的超声特征也并非易事,需要专业知识。经验不足的操作者可能难以正确识别恶性肿瘤,导致观察者间差异显著(Baker等人,1999)。为提高诊断准确性,许多计算机辅助诊断(CAD)工具已被提出用于乳腺超声诊断(Becker等人,2018;Hijab等人,2019;Han等人,2017;Al-Dhabyani等人,2019;Moon等人,2020)。尽管这些方法做出了贡献,但大多数现有方法仅关注病变良恶性的二元分类问题。能够直接从超声图像推断乳腺病变不同组织学亚型的新模型可更好地协助临床医生,同时避免有创活检给患者带来的费用和痛苦。 要实现这一目标,需要解决的主要问题之一是长尾问题(见图1的柱状图)。尽管大多数计算机视觉模型在数据集大规模(如ImageNet、MS-Coco)且类别均衡分布(如CIFAR-10、MNIST)的理想场景下表现优异,但医学图像分析常因不同疾病自然发病率的显著差异,面临类别不平衡或少数类样本稀缺的问题。这种偏斜分布易使模型偏向主导类别而忽视尾部类别,严重影响收敛性和泛化能力(Wang等人,2020;Tan等人,2020;Cao等人,2019)。已有研究通过重采样(Liu等人,2008;Hui等人,2005;Zhang和Pfister,2021;Chawla等人,2002)或重加权(Cao等人,2019;Cui等人,2019;Ren等人,2020;Wang等人,2021a)来弥补这一问题。但许多方法在提高尾部类别预测准确性的同时,常因头部类别样本不足而牺牲其准确性,这在计算机辅助诊断中可能导致误诊,存在风险。因此,需要更鲁棒且均衡的模型来解决乳腺病变分类中的长尾问题。 在临床实践中,多名医疗专业人员常利用各自的专业知识减少诊断误差和偏差,尤其是处理复杂或罕见病例时。这种集体努力被称为集体智能,已证明能提高诊断精度(Barnett等人,2019)。该方法认识到,没有任何个体具备在所有情况下完成完美诊断所需的全部信息或专业知识。类似地,不同的深度学习模型具有不同的能力——某些模型倾向于关注数据集中普遍存在的主导模式,擅长识别头部类别,而另一些则对尾部类别表现出更强的响应能力。 基于上述观察,我们提出了用于超声长尾乳腺病变分类的多专家竞争在线蒸馏(CoDE)框架,旨在实现对少数样本的敏感性与对多数模式的鲁棒识别之间的和谐平衡。主要贡献包括: • 首个可直接从超声图像推断8种不同组织学亚型(如FAD、IC、MT,见图1)的全自动化方法。 • 利用多样化模型的集体优势应对数据不平衡和模型偏差的有效框架,适用于长尾分类。 • 双层平衡个体监督(DBIS)模块,提供概率调整的亚型级指导,以及临床先验引导的癌性级别监督,以进一步优化潜在表征。 • 新型基于批次的在线竞争蒸馏(BOCD)模块,通过协作与竞争融合所有专家的学习过程,促进专家间交换多样化见解和专业知识,同时考虑对偏差样本的不变性。
Aastract
摘要
Breast lesions display a wide spectrum of histological subtypes. Recognizing these subtypes is vital for optimizing patient care and facilitating tailored treatment strategies compared to a simplistic binary classification ofmalignancy. However, this task relies on invasive biopsy tests, which carry inherent risks and can lead to overdiagnosis, unnecessary expenses, and pain for patients. To avoid this, we propose to infer lesion subtypes fromultrasound images directly. Meanwhile, the incidence rates of different subtypes exhibit a skewed long-taileddistribution that presents substantial challenges for effective recognition. Inspired by collective intelligence inclinical diagnosis to handle complex or rare cases, we proposed a framework–CoDE–to amalgamate diverseexpertise of different backbones to bolster robustness across varying scenarios for automated lesion subtyping.It utilizes dual-level balanced individual supervision to fully exploit prior knowledge while consideringclass imbalance. It is also equipped with a batch-based online competitive distillation module to stimulatedynamic knowledge exchange. Experimental results demonstrate that the model surpassed the state-of-the-artapproaches by more than 7.22% in F1-score facing a challenging breast dataset with an imbalance ratio ashigh as 47.9:1.
乳腺病变具有广泛的组织学亚型谱系。与简单的恶性肿瘤二元分类相比,识别这些亚型对于优化患者护理和制定个性化治疗策略至关重要。然而,这一任务依赖有创活检检测,其存在固有风险,并可能导致过度诊断、不必要的费用以及患者痛苦。为避免这些问题,我们提出直接从超声图像推断病变亚型。同时,不同亚型的发病率呈现偏斜的长尾分布,这为有效识别带来了重大挑战。受临床诊断中处理复杂或罕见病例的集体智能启发,我们提出了一个框架——CoDE,以融合不同骨干网络的多样化专业知识,增强自动化病变亚型分类在不同场景下的鲁棒性。该框架利用双层平衡个体监督来充分利用先验知识,同时考虑类别不平衡问题。它还配备了基于批次的在线竞争蒸馏模块,以促进动态知识交换。实验结果表明,在面对不平衡比高达47.9:1的具有挑战性的乳腺数据集时,该模型的F1分数比最先进的方法高出7.22%以上。
Method
方法
Addressing the long-tailed problem is imperative in tackling multiclass breast lesion classification, as the sample size per class is heavilyskewed. The main challenge relies on calibrating the emphasis oflearning to minority classes while refraining from neglecting the headclasses. To tackle this, we propose a CoDE framework to imitate collective intelligence in clinical diagnosis (see Fig. 2). In this section,we first introduce the overall framework and explain how we curatethe expert cohort. Then, Dual-Level Balanced Individual Supervision(DBIS) is presented to hoist individual recognition capacity in longtailed distribution. Finally, we propose the Batch-based Online Competitive Distillation (BOCD) module that fosters both competition andcooperation for effective knowledge exchange.
解决长尾问题在多类别乳腺病变分类中至关重要,因为各类别的样本量严重不均衡。主要挑战在于调整对少数类别的学习侧重,同时避免忽视头部类别。为解决这一问题,我们提出了模仿临床诊断中集体智能的CoDE框架(见图2)。在本节中,我们首先介绍整体框架并解释如何构建专家群组,其次阐述双层平衡个体监督(DBIS)以提升长尾分布下的个体识别能力,最后提出基于批次的在线竞争蒸馏(BOCD)模块,通过促进竞争与合作实现有效的知识交互。
Conclusion
结论
In this paper, we proposed a novel framework that is able toinfer histological subtypes of breast lesions directly from ultrasoundimages. Its core design was inspired by collective intelligence in clinical diagnosis, which plays to the strengths of each expert. To combat the challenges of long-tailed classification, the framework allowseach expert to learn both independently–via dual-level supervision,and collaboratively–via competitive distillation. Experimental resultsshowed that our method can achieve a more balanced performancethan other SOTA long-tailed algorithms, and avoid disproportionatelyfavoring the head classes or the tail classes. As the design of thisframework is general and flexible, it could be applied to other longtailed applications given different datasets and different computationresources. Future research could explore adapting the methodology toother modalities, investigating noise-robust architectural modificationsto optimize real-time deployment for practical clinical integration.
在本文中,我们提出了一种能够直接从超声图像推断乳腺病变组织学亚型的新型框架。其核心设计灵感源于临床诊断中的集体智能,充分发挥每位"专家"的优势。为应对长尾分类挑战,该框架允许各专家通过双层监督独立学习,并通过竞争蒸馏实现协作学习。实验结果表明,与其他最先进的长尾算法相比,我们的方法可实现更均衡的性能,避免过度偏向头部或尾部类别。由于该框架的设计具有通用性和灵活性,可在不同数据集和计算资源条件下应用于其他长尾问题。未来研究可探索将该方法适配于其他模态,研究抗噪声的架构改进,以优化实时部署能力,推动临床实际整合。
Results
结果
Figure
图
Fig. 1. Ultrasound images of lesions with different histological subtypes. IC and DCIS belong to the malignant class, while the others correspond to the benign category. Thesubfigure on the right demonstrates that the incidence rates of different subtypes exhibit a long-tailed distribution
图1. 不同组织学亚型病变的超声图像。浸润性癌(IC)和导管原位癌(DCIS)属于恶性类别,其余对应良性类别。右侧子图表明,不同亚型的发病率呈现长尾分布。
Fig. 2. The proposed Competitive online Distillation for multi-Expert (CoDE) framework. It is comprised of three distinct experts (shown in blue, orange, and yellow) that extractspecialized features that are subsequently amalgamated to derive optimal decision-making. All experts received individual supervision from DBIS while interacting with eachcompetition through BOCD
图2. 提出的多专家竞争在线蒸馏(CoDE)框架。该框架由三个不同的专家网络(以蓝色、橙色和黄色显示)组成,它们提取专业化特征并随后融合以生成最优决策。所有专家在通过BOCD进行竞争交互的同时,均接受来自DBIS的个体监督。
Fig. 3. Pipeline of Group-Specific Prompts Tuning module for ViT encoder (ViT-GPT).The keys (blue) from the group match the pre-trained cls token (yellow) of ViT togenerate query. The best matched prompts (dark gray) are selected to inject into thelast L-K blocks (light gray blocks with dark gray section on right).
图3. ViT编码器的组特异性提示调优模块(ViT-GPT)流程。组内的键(蓝色)与ViT预训练的分类标记(cls token,黄色)匹配以生成查询,选择最佳匹配的提示(深灰色)注入到最后L-K个块中(浅灰色块右侧的深灰色部分)。
Fig. 4. Illustration of DBIS module. It modifies the original distribution (shown on the left) through subtype-level supervision (𝑠−𝑙 ) and cancerous-level supervision (𝑐−𝑙 ) to yieldthe optimal distribution. Different shapes represent different subtypes, such as parallelogram denotes SA, hexagon denotes DCIS, triangle symbols for IC samples, and cross symbolsfor MT samples
图4. DBIS模块示意图。该模块通过亚型级监督(𝑠−𝑙)和癌性级监督(𝑐−𝑙)修改原始分布(左侧所示),以生成最优分布。不同形状代表不同亚型,如平行四边形表示硬化性腺病(SA),六边形表示导管原位癌(DCIS),三角形表示浸润性癌(IC)样本,十字形表示乳腺炎(MT)样本。
Fig. 5. Illustrative instances of accurate and erroneous predictions. Each subfigure represents a test example. (a,b,d,e) correspond to correct predictions, while (c,f) are erroneousones.
图5. 正确与错误预测的示例。每个子图代表一个测试样本。(a,b,d,e)为正确预测案例,(c,f)为错误预测案例。
Fig. 6. Bar chart of the F1-scores of different models. The gray bars indicate thebaseline model. The light blue displays the scores of different experts when trainedindependently. The dark blue shows the performance of each expert in the proposedcollective intelligence framework.
图6. 不同模型F1分数柱状图。灰色柱表示基线模型,浅蓝色显示独立训练时各专家的分数,深蓝色展示所提集体智能框架中各专家的性能。
Fig. 7. Plot of occurrences of experts selection for BOCD. Different colors representdifferent experts. The 𝑥-axis denotes the training epoch. The 𝑦-axis denotes the totaltimes of the expert was selected as the teacher model during a training epoch.
图7. BOCD模块中专家选择次数分布图。不同颜色代表不同专家,x轴为训练轮次(epoch),y轴为每个训练轮次中专家被选为教师模型的总次数。
Fig. 8. Impact of 𝛽 over the proposed framework on the test set. The 𝑥-axis denotesthe value of 𝛽, while the 𝑦-axis represents the obtained F1-score. Note that the 𝑦-axiswas zoomed to a narrow range for clearer visualization
图8. 测试集上参数𝛽对所提框架的影响。x轴表示𝛽的取值,y轴为对应的F1分数。注意y轴已缩放到较窄范围以优化可视化效果。
Fig. 9. Influence of 𝛽 on the validation set under three different train/validation/test split.
图8. 测试集上参数𝛽对所提框架的影响。x轴表示𝛽的取值,y轴为对应的F1分数。注意y轴已缩放到较窄范围以优化可视化效果。
Table
表
Table 1Class distribution of the histological subtypes of the breast lesions in our dataset. B/Mdenotes whether each subtype belongs to the benign or malignant category
表 1 本数据集中乳腺病变组织学亚型的类别分布。B/M 表示各亚型属于良性(B)或恶性(M)类别
Table 2Results of the comparison experiment. Each section displays the results of baseline models, SOTA approaches using re-balancingstrategies and ensemble-based SOTA approaches, respectively. Results are evaluated using F1-score, Recall, Precision, and shotAccuracy. Red and blue fonts highlight the top-2 trackers in each metric
表 2 对比实验结果。表格各部分分别展示了基线模型、采用重平衡策略的先进方法(SOTA)和基于集成的先进方法的结果。评估指标包括 F1 分数、召回率、精确率和 shot 准确率。红色和蓝色字体分别标注各指标的前两名结果
Table 3Results of different CoDE variants. Each section denotes CoDE with a different number of experts while each row uses different backbone architectures. Results are evaluated usingF1-score, Recall, Precision, and Accuracy over different groups of classes
表3 不同CoDE变体的结果。每部分表示具有不同专家数量的CoDE,而每行使用不同的骨干网络架构。结果通过不同类别组的F1分数、召回率、精确率和准确率进行评估
Table 4Results of the ablation study. 𝐿𝑐−𝑙 and BOCD are added to validate their efficacy. 𝐵𝐶-𝐻𝑒𝑎𝑑 denotes an extra binary classification head added to perform cancerous-levelclassification. Both the accuracy of the main long-tailed classification task (column 2–8) and the affiliated malignancy classification task (column 9–10, labeled by BM).
表4 消融研究结果。通过添加𝐿𝑐−𝑙和BOCD模块验证其有效性。𝐵𝐶-𝐻𝑒𝑎𝑑表示额外添加的二元分类头,用于执行癌性级别分类。结果包括主要长尾分类任务的准确率(第2-8列)和附属恶性分类任务的准确率(第9-10列,以良性/恶性(BM)标记)。
Table 5Impact of the mini-batch size 𝑀 in the BOCD module. Different rows correspond to different values of ?
表 5 BOCD 模块中小批量大小𝑀的影响。不同行对应𝑀的不同取值
Table 6Model performance given center-stratified train/test split.
表 6 基于中心分层训练 / 测试划分的模型性能