MonoPCC：用于内窥镜图像单目深度估计的光度不变循环约束|文献速递-深度学习医疗AI最新文献-EW帮帮网

Title

题目

MonoPCC: Photometric-invariant cycle constraint for monocular depth estimation of endoscopic images

MonoPCC：用于内窥镜图像单目深度估计的光度不变循环约束

文献速递介绍

单目内窥镜是胃肠诊断和手术的关键医学成像工具，但其通常提供狭窄的视野（FOV）。3D场景重建有助于扩大视野，并通过与术前计算机断层扫描（CT）配准实现手术导航等更高级的应用。单目内窥镜图像的深度估计是重建3D结构的前提，但由于缺乏真实深度（GT）标签而极具挑战性。单目深度估计的典型解决方案依赖自监督学习，其核心思想是真实图像与扭曲图像之间的光度约束。具体而言，需要构建两个卷积神经网络（CNN）：一个称为DepthNet，另一个称为PoseNet。这两个CNN分别估计每幅图像的深度图和每两个相邻图像的相机位姿变化，基于此，内窥镜视频中的源帧可以投影到3D空间并扭曲到另一帧的目标视图。DepthNet和PoseNet通过联合优化来最小化光度损失，该损失本质上是扭曲图像与目标图像之间的像素级差异。然而，光源固定在内窥镜上并随相机移动，导致源帧和目标帧之间存在显著的亮度波动。由于近距离观察下的非朗伯反射，这一问题可能进一步恶化，如图1（a）-（b）所示。因此，在目标图像与扭曲后的源图像之间，亮度差异占主导地位（如图1（b）-（c）所示），从而误导了自监督学习中的光度约束。为了增强亮度波动下光度约束的可靠性，研究人员已做出许多努力。一种直观的解决方案是预先校准内窥镜视频帧的亮度，可采用线性强度变换（Ozyoruk等人，2021）或训练好的外观流模型（Shao等人，2022）。然而，前者仅解决全局亮度不一致问题，后者由于引入繁重的计算而增加了训练难度。此外，外观流模型的可靠性也因弱自监督而无法始终得到保证，这可能导致错误修改与亮度变化无关的区域。本文旨在不依赖任何辅助模型的情况下解决亮度不一致的瓶颈问题。我们的灵感来源于最近一种名为TC-Depth（Ruhkamp等人，2021）的方法，该方法引入循环扭曲来解决遮挡问题。TC-Depth将目标图像扭曲到源图像，然后再扭曲回自身以识别每个被遮挡的像素，因为他们假设被遮挡的像素无法精确返回原始位置。我们发现，这种循环扭曲可以自然克服亮度不一致问题，并且与仅从源到目标的扭曲相比，能产生更可靠的扭曲图像，如图1（b）-（d）所示。然而，直接应用循环扭曲通常在光度约束中失败，原因在于：（1）循环扭曲的两次双线性插值会过度模糊图像；（2）深度和位姿估计网络是主动学习的，这使得中间扭曲不稳定且难以收敛。基于上述分析，我们提出了基于光度不变循环约束的单目深度估计方法（MonoPCC），该方法采用循环扭曲的思想，但对其进行了显著改进，使光度约束对不一致的亮度具有不变性。具体而言，MonoPCC从目标图像出发，沿着闭环路径（目标-源-目标）获取循环扭曲的目标图像，该图像继承了原始目标图像的一致亮度。为了使这种循环扭曲在光度约束中有效，MonoPCC采用了基于快速傅里叶变换（FFT）的无学习结构移植模块（STM），以最小化模糊效应的负面影响。STM通过从源图像“借用”相位频率部分来恢复中间扭曲图像中丢失的结构细节。此外，MonoPCC没有在目标-源和源-目标扭曲路径中共享网络权重，而是使用指数移动平均（EMA）策略连接这两条路径，以稳定第一条路径中的中间结果。综上所述，我们的主要贡献如下： 1. 提出MonoPCC，通过简单采用循环形式的扭曲使光度约束对亮度变化具有不变性，消除了自监督学习中不一致亮度导致的误导。 2. 引入两种关键技术，即结构移植模块（STM）和基于EMA的稳定训练。STM恢复了插值导致的图像细节丢失，EMA稳定了前向扭曲。这些共同保证了MonoPCC在循环扭曲下的有效训练。 3. 在四个公开的内窥镜数据集（SCARED（Allan等人，2021）、SimCol3D（Rau等人，2023）、SERV-CT（Edwards等人，2022）和Hamlyn（Mountney等人，2010；Stoyanov等人，2010；Pratt等人，2010））以及一个公开的自然数据集KITTI（Geiger等人，2012）上进行了全面且广泛的实验。与八种最先进方法的比较结果表明，MonoPCC在四个内窥镜数据集上的绝对相对误差分别降低了7.27%、9.38%、9.90%和3.17%，显示出其优越性，以及在训练中抵抗亮度不一致的强大能力。此外，在KITTI上的比较结果进一步验证了MonoPCC即使在亮度变化通常不显著的自然场景中的竞争力。

Abatract

摘要

Photometric constraint is indispensable for self-supervised monocular depth estimation. It involves warping a source image onto a target view using estimated depth&pose, and then minimizing the difference between the warped and target images. However, the endoscopic built-in light causes significant brightness fluctuations, and thus makes the photometric constraint unreliable. Previous efforts only mitigate this relying on extra models to calibrate image brightness. In this paper, we propose MonoPCC to address the brightness inconsistency radically by reshaping the photometric constraint into a cycle form. Instead of only warping the source image, MonoPCC constructs a closed loop consisting of two opposite forward–backward warping paths: from target to source and then back to target. Thus, the target image finally receives an image cycle-warped from itself, which naturally makes the constraint invariant to brightness changes. Moreover, MonoPCC transplants the source image’s phase-frequency into the intermediate warped image to avoid structure lost, and also stabilizes the training via an exponential moving average (EMA) strategy to avoid frequent changes in the forward warping. The comprehensive and extensive experimental results on five datasets demonstrate that our proposed MonoPCC shows a great robustness to the brightness inconsistency, and exceeds other state-of-the-arts by reducing the absolute relative error by 7.27%, 9.38%, 9.90% and 3.17% on four endoscopic datasets, respectively; superior results on the outdoor dataset verify the competitiveness of MonoPCC for the natural scenario.

光度约束对于自监督单目深度估计至关重要。它涉及使用估计的深度和位姿将源图像扭曲到目标视图，然后最小化扭曲图像与目标图像之间的差异。然而，内窥镜的内置光源会导致显著的亮度波动，从而使光度约束不可靠。以往的研究仅依赖额外的模型来校准图像亮度以缓解这一问题。在本文中，我们提出MonoPCC，通过将光度约束重塑为循环形式，从根本上解决亮度不一致问题。 MonoPCC不再仅对源图像进行扭曲，而是构建了一个由两条相反的前向-后向扭曲路径组成的闭环：从目标到源，再回到目标。因此，目标图像最终会收到一个由其自身循环扭曲的图像，这自然使约束对亮度变化具有不变性。此外，MonoPCC将源图像的相位频率移植到中间扭曲图像中，以避免结构丢失，并通过指数移动平均（EMA）策略稳定训练，避免前向扭曲的频繁变化。在五个数据集上进行的全面且广泛的实验结果表明，我们提出的MonoPCC对亮度不一致具有很强的鲁棒性，在四个内窥镜数据集上，其绝对相对误差分别比其他最先进方法降低了7.27%、9.38%、9.90%和3.17%；在户外数据集上的优异结果验证了MonoPCC在自然场景中的竞争力。

Method

方法

Fig. 2 illustrates the pipeline of MonoPCC, consisting of both forward and backward warping paths in the training phase. We first explain how to warp images for self-supervised learning in Section 3.1, and then detail the photometric-invariant principle of MonoPCC in Section 3.2, as well as its two key enabling techniques in Section 3.3 and Section 3.4, i.e., structure transplant module (STM) for avoiding detail lost and EMA between two paths for stabilizing the training.

图2展示了MonoPCC的训练流程，其中包括训练阶段的前向和后向扭曲路径。我们首先在3.1节中解释如何为自监督学习进行图像扭曲，然后在3.2节中详细介绍MonoPCC的光度不变性原理，并在3.3节和3.4节中阐述其两项关键支持技术，即用于避免细节丢失的结构移植模块（STM）和用于稳定训练的双路径指数移动平均（EMA）策略。

Conclusion

结论

Self-supervised monocular depth estimation is challenging for endoscopic scenes due to the severe negative impact of brightness fluctuations on the photometric constraint. In this paper, we propose a cycle-form warping to naturally overcome the brightness inconsistency of endoscopic images, and develop a MonoPCC for robust monocular depth estimation by using a re-designed photometric-invariant cycle constraint. To make the cycle-form warping effective in the photometric constraint, MonoPCC is equipped with two enabling techniques, i.e., structure transplant module (STM) and exponential moving average (EMA) strategy. STM alleviates image detail degradation to validate the backward warping, which uses the result of forward warping as input. EMA bridges the learning of network weights in the forward and backward warping, and stabilizes the intermediate warped image to ensure an effective convergence. The comprehensive and extensive comparisons with 8 state-of-the-arts on five public datasets, i.e., SCARED, SimCol3D, SERV-CT, Hamlyn, and KITTI, demonstrate that MonoPCC achieves a superior performance by decreasing the absolute relative error by 7.27%, 9.38%, 9.90% and 3.17% on four endoscopic datasets, respectively, and shows the competitiveness even for the natural scenario. Additionally, two ablation studies are conducted to confirm the effectiveness of three developed modules and the advancement of MonoPCC over other similar techniques against brightness fluctuations. Limitations. The current pipeline relies on a single frame to infer the depth map. Since each prediction is made independently, the model lacks perception of temporal consistency, meaning that the depth values at the same location may vary over time. This temporal inconsistency can lead to artifacts, such as overlapping tissue surfaces, as shown in the 3D reconstruction visualization in Fig. 12. Furthermore, our method is primarily designed for static endoscopic scenes. In dynamic scenarios involving tissue deformation, MonoPCC may not perform effectively. This is because the depth values at corresponding positions between source–target paired images can change locally, making it difficult to establish the cycle warping path consistently. Potential Future Application. In this paper, we have demonstrated the effectiveness of the PCC strategy for self-supervised monocular depth estimation in endoscopic images. We believe that our framework can be seamlessly integrated into other related tasks, such as stereo matching (Shi et al., 2023) and metric depth estimation (Wei et al., 2022, 2024), both of which face challenges due to brightness fluctuations. Additionally, in the field of NeRF-based scene reconstruction, several depth-prior-assisted methods (Wang et al., 2022; Li et al., 2024; Huang et al., 2024) utilize estimated depth to guide model training. Therefore, our depth estimator, designed specifically for endoscopic scenes, could also enhance the performance of such downstream tasks.

由于亮度波动对光度约束的严重负面影响，自监督单目深度估计在内窥镜场景中具有挑战性。本文提出一种循环形式的扭曲方法以自然克服内窥镜图像的亮度不一致性，并通过重新设计的光度不变循环约束开发了用于鲁棒单目深度估计的MonoPCC模型。为使循环扭曲在光度约束中有效，MonoPCC配备两项关键技术：结构移植模块（STM）和指数移动平均（EMA）策略。STM通过利用前向扭曲结果作为输入，缓解图像细节退化以支持后向扭曲；EMA则桥接前向与后向扭曲的网络权重学习，稳定中间扭曲图像以确保有效收敛。在SCARED、SimCol3D、SERV-CT、Hamlyn和KITTI五个公开数据集上与8种最先进方法的全面对比表明，MonoPCC在四个内窥镜数据集上分别将绝对相对误差降低7.27%、9.38%、9.90%和3.17%，性能显著优于现有方法，甚至在自然场景中也展现出竞争力。此外，两项消融实验验证了三个模块的有效性，以及MonoPCC在抗亮度波动方面相较其他类似技术的先进性。局限性：当前流程依赖单帧推断深度图，由于独立预测缺乏时间一致性感知，同一位置的深度值可能随时间变化，导致3D重建中出现组织表面重叠等伪影（如图12所示）。此外，方法主要针对静态内窥镜场景设计，在涉及组织变形的动态场景中可能失效——源-目标图像对的对应位置深度值可能局部变化，难以建立一致的循环扭曲路径。潜在未来应用：本文验证了PCC策略在内窥镜图像自监督单目深度估计中的有效性。我们认为该框架可无缝集成到其他相关任务中，例如面临亮度波动挑战的立体匹配（Shi等，2023）和度量深度估计（Wei等，2022, 2024）。此外，在基于NeRF的场景重建领域，若干深度先验辅助方法（Wang等，2022；Li等，2024；Huang等，2024）利用估计深度指导模型训练，因此专为内窥镜场景设计的深度估计器也可提升此类下游任务的性能。

Figure

图

Fig. 1. (a)–(b) are the source ???? and target ???? frames. (c) is the warped image from the source to target. (d) is the cycle-warped image along the target–source–target path for reliable photometric constraint. Box contour colors distinguish different brightness patterns

图1 (a)-(b)为源图像(I_s)和目标图像(I_t)。 (c)为从源图像扭曲到目标视图的扭曲图像。 (d)为沿目标-源-目标路径的循环扭曲图像，用于构建可靠的光度约束。方框轮廓颜色区分了不同的亮度模式。

Fig. 2. The training pipeline of MonoPCC, which consists of forward and backward cascaded warping paths bridged by two enabling techniques, i.e., structure transplant module (STM) and exponential moving average (EMA). The training has two phases, i.e., warm-up to initialize the network weights for reasonable forward warping, and follow-up to resist the brightness changes. Different box contour colors code different brightness patterns. © means concatenation

图2 MonoPCC的训练流程，包含由两项关键技术——结构移植模块（STM）和指数移动平均（EMA）连接的前向和后向级联扭曲路径。训练分为两个阶段：一是用于初始化网络权重以实现合理前向扭曲的预热阶段，二是用于抵抗亮度变化的后续阶段。不同方框轮廓颜色编码不同的亮度模式，©表示拼接操作。

Fig. 3. Details of STM, which utilizes the phase-frequency of the source image ???? to replace that of the warped image ????→?? to avoid image detail lost.

图3 结构移植模块（STM）的细节示意图，该模块利用源图像(I_s)的相位频率替换扭曲图像(I_s\rightarrow I_t)的相位频率，以避免图像细节丢失。

Fig. 4. The auxiliary perception constraint by backward warping the encoding feature maps instead of raw images.

图4 通过对编码特征图（而非原始图像）进行反向扭曲的辅助感知约束。

Fig. 5. The Abs Rel error maps of comparison methods on SCARED and SimCol3D, with close-up details highlighted. The regions of interest (ROIs) are outlined with red dashed lines, and the Opencv Jet Colormap is used for visualization

图5 在SCARED和SimCol3D数据集上对比方法的绝对相对误差（Abs Rel）图，突出显示了特写细节。感兴趣区域（ROI）用红色虚线勾勒，可视化采用OpenCV Jet颜色映射。

Fig. 6. The Abs Rel error maps of comparison methods on SERV-CT and Hamlyn, with close-up details highlighted. The regions of interest (ROIs) are outlined with red dashed lines, and the Opencv Jet Colormap is used for visualization.

图6 对比方法在SERV-CT和Hamlyn数据集上的绝对相对误差（Abs Rel）图，突出显示了特写细节。感兴趣区域（ROI）用红色虚线勾勒，可视化采用OpenCV Jet颜色映射。

Fig. 7. The Abs Rel error maps of seven ablation variants, including effectiveness of three components, with close-up details highlighted. (a)–(g) correspond to the 1st to the 7th rows in Table 4. The regions of interest (ROIs) are outlined with red dashed lines, and the Opencv Jet Colormap is used for visualization.

图7 七种消融变体的绝对相对误差（Abs Rel）图，包括三个组件的有效性分析，突出显示了特写细节。（a）-（g）对应表4中的第1行至第7行。感兴趣区域（ROI）用红色虚线勾勒，可视化采用OpenCV Jet颜色映射。

Fig. 8. The Abs Rel error maps of MonoPCC and other similar modules against photometric inconsistency, with close-up details highlighted. (a)–(d) correspond to the 1st to the 4th rows in Table 5. The regions of interest (ROIs) are outlined with red dashed lines, and the Opencv Jet Colormap is used for visualization

图8 MonoPCC与其他类似抗光度不一致模块的绝对相对误差（Abs Rel）图，突出显示了特写细节。（a）-（d）对应表5中的第1行至第4行。感兴趣区域（ROI）用红色虚线勾勒，可视化采用OpenCV Jet颜色映射

Fig. 9. An example of created brightness perturbation. From left to right is the original image, globally perturbated (?? = 1.2), and both globally and locally (bright spots) perturbated. The color-coded maps above them describe the subtractive difference between the perturbated image and its original one

图9 生成的亮度扰动示例。从左到右分别为原始图像、全局扰动图像（(\gamma=1.2)）、全局和局部（亮点）联合扰动图像。图像上方的彩色映射图表示扰动图像与原始图像的差值（减法运算结果）。

Fig. 10. The Abs Rel errors of different methods trained on the two brightnessperturbed copies of SCARED and the original SCARED

图10 在SCARED数据集的两个亮度扰动版本及原始数据集上训练的不同方法的绝对相对误差（Abs Rel）。

Fig. 11. The qualitative pose estimation comparison based on two SCARED trajectories.

图11 基于两条SCARED轨迹的定性位姿估计对比。

Fig. 12. Qualitative comparison results on the 3D scene reconstruction based on the estimated depth maps of two methods. The three sequences are selected from SCARED.

图12 基于两种方法估计深度图的3D场景重建定性对比结果。三个序列选自SCARED数据集。

Table

表

Table 1 Evaluation metrics of monocular depth estimation, where ?? refers to the number of valid pixels in depth maps, ???? and ?? ∗ ?? denote the estimated and GT depth of ??th pixel, respectively. The Iverson bracket [⋅] yields 1 if the statement is true, otherwise 0.

表1 单目深度估计的评估指标，其中 (N) 表示深度图中有效像素的数量，(\hat{d}i) 和 (d^*i) 分别表示第 (i) 个像素的估计深度和真实深度（GT）。艾弗森括号 ([\cdot]) 在条件为真时取值为1，否则为0。

Table 2 Quantitative comparison results on SCARED and SimCol3D. The best results are marked in bold and the second-best underlined. The paired p-values between MonoPCC and others are all less than 0.05.

表2 在SCARED和SimCol3D数据集上的定量对比结果。最佳结果以粗体标注，次佳结果加下划线。MonoPCC与其他方法的配对p值均小于0.05。

Table 3 Quantitative comparison results on SERV-CT and Hamlyn. The best results are marked in bold and the second-best underlined. The paired p-values between MonoPCC and others are all less than 0.05, except for ?? on SERV-CT compared to MonoViT

表3 在SERV-CT和Hamlyn数据集上的定量对比结果。最佳结果以粗体标注，次佳结果加下划线。除SERV-CT数据集上的??指标与MonoViT相比外，MonoPCC与其他方法的配对p值均小于0.05。

Table 4 The rows except the first one are the comparison results of the five variants and the complete MonoPCC, which are all cycle-constrained. The first row is the backbone MonoViT using the regular non-cycle constraint

表4 除第一行外，其余行均为五种变体与完整MonoPCC的对比结果，所有方法均采用循环约束。第一行为使用常规非循环约束的主干网络MonoViT。

Table 5 Comparison results of different techniques for addressing the brightness fluctuations in self-supervised learning. The last row is the technique used in MonoPCC

表5 自监督学习中解决亮度波动的不同技术对比结果。最后一行为MonoPCC所使用的技术。

Table 6 Quantitative comparison results on KITTI. The best results are marked in bold and the second best ones are underlined

表6 在KITTI数据集上的定量对比结果。最佳结果以粗体标注，次佳结果加下划线。

Table 7 Quantitative comparison results (Absolute Trajectory Error) of pose estimation on two trajectories of SCARED. The best results are marked in bold and the second-best underlined

表7 基于SCARED数据集两条轨迹的位姿估计定量对比结果（绝对轨迹误差）。最佳结果以粗体标注，次佳结果加下划线。

MonoPCC：用于内窥镜图像单目深度估计的光度不变循环约束|文献速递-深度学习医疗AI最新文献

网站公告

今日签到

热门文章

最新发布