3D分割系列论文梳理

发布于:2025-08-20 ⋅ 阅读:(19) ⋅ 点赞:(0)

3D分割指标含义

参考链接:
3d分割指标含义简略直观版
3d分割指标含义详细版本

基于混淆矩阵进行计算:
miou: class iou:每一类预测正确的数量 除 (预测属于此类的样本数量+此类本有的样本总数量-此类预测正确的数量)。ps:因为多加了一次预测正确的数量。miou是所有类的IoU的平均值
macc:class accuracy: 每一类预测正确的数量除此类本有的样本总数量。macc是所有类别的平均accuracy
oa:overall accuracy 预测正确的样本/总样本数

基于点的3D分割网络的整个研究改进过程(参考pointvector论文的综述部分)

In contrast to the voxelization and multiview methods, point-based methods deal directly with point clouds.

两篇网络架构方面的奠基性工作

PointNet first proposes using MLP to process point clouds directly.
PointNet++ subsequently introduces a hierarchical structure to improve the feature extraction.

后续改进主要集中在局部特征提取方面的优化

Subsequent works focused on the design of fine-grained local feature extractors.
Graph-based methods rely on a graph neural network and introduce point features and edge features to model local relationships. Conv-based methods propose several dynamic convolution kernels to adaptively aggregate neighborhood features. Many transformer-like networks extract local features with self-attention.
Recently, MLP-like networks are able to obtain good results with simple networks by enhancing the features. PointMLP proposes a geometric affine module to normalize the feature. Rep-Surf fits the surface information through the triangular plane, models umbrella surfaces to provide geometric information. PointNeXt integrates training strategies and model scaling.

Pointnet

贡献点

The key contributions of our work are as follows:
• We design a novel deep net architecture suitable for consuming unordered point sets in 3D;
其他架构:Volumetric CNNs,Multiview CNNs,Spectral CNNs,Feature-based DNNs

• We show how such a net can be trained to perform 3D shape classification, shape part segmentation and scene semantic parsing tasks;
附件材料中也提供了检测的效果,不过通过P-R曲线来看,不是特别理想。

• We provide thorough empirical and theoretical analysis on the stability and efficiency of our method;

• We illustrate the 3D features computed by the selected neurons in the net and develop intuitive explanations for its performance

网络架构:

在这里插入图片描述

网络架构说明

基于点云特性设计

Unordered:point cloud is a set of points without specific order.
Interaction among points:the model needs to be able to capture local structures from nearby points, and the combinatorial interactions among local structures.
Invariance under transformations

The network has three key modules: the max pooling layer as a symmetric function to aggregate information from all the points, a local and global information combination structure, and two joint alignment networks that align both input points and point features.

Symmetry Function for Unordered Input
  • sort input into a canonical order: 高维空间中,简单排序无法提供对点数扰动稳定的排序方式(因高维到一维的映射难以保持空间邻近性),导致排序问题持续存在,进而影响模型学习一致的输入输出映射,最终限制模型性能。
  • Treat the input as a sequence to train an RNN, but augment the training data by all kinds of permutations: The idea to use RNN considers the point set as a sequential signal and hopes that by training the RNN with randomly permuted sequences, the RNN will become invariant to input order. However in “OrderMatters”the authors have shown that order does matter and cannot be totally omitted.
  • use a simple symmetric function to aggregate the information from each point:
    实现思路:The idea is to approximate a general function defined on a point set by applying a symmetric function on transformed elements in the set:
    在这里插入图片描述在这里插入图片描述搭建模型:
    approximate h by a multi-layer perceptron network and g by a composition of a single variable function and a max pooling function. This is found to work well by experiments. Through a collection of h, we can learn a number of f’s to capture different properties of the set.
Local and Global Information Aggregation (shape part segmentation and scene segmentation)

After computing the global point cloud feature vector, we feed it back to per point features by concatenating the global feature with each of the point features. Then we extract new per point features based on the combined point features - this time the per point feature is aware of both the local and global information. (见上图segmentation network)

Joint Alignment Network

The semantic labeling of a point cloud has to be invariant if the point cloud undergoes certain geometric transformations, such as rigid transformation. We therefore expect that the learnt representation by our point set is invariant to these transformations.
A natural solution is to align all input set to a canonical space before feature extraction.
实现思路:见上图的T-Net

理论分析

PointNet 能够近似任何定义在点集上的连续集合函数。
PointNet 的输出由一个 “关键点子集”(critical point set)决定,且对输入扰动、噪声和缺失点具有鲁棒性。

Pointnet++

贡献点:

pointnet存在的问题

Few prior works study deep learning on point sets. PointNet [20] is a pioneer in this direction. However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes.

本文的改进

In this work, we introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. By exploiting metric space distances, our network is able to learn local features with increasing contextual scales. With further observation that point sets are usually sampled with varying densities, which results in greatly decreased performance for networks trained on uniform densities, we propose novel set learning layers to adaptively combine features from multiple scales. Experiments show that our network called PointNet++ is able to learn deep point set features efficiently and robustly. In particular, results significantly better than state-of-the-art have been obtained on challenging benchmarks of 3D point clouds.

网络架构:

在这里插入图片描述

网络架构说明

下述的每一部分都属于pointnet++的创新点。

Hierarchical Point Set Feature Learning

Sampling layer:通过iterative farthest point sampling (FPS)选择局部区域的中心。
Grouping layer:Ball query finds all points that are within a radius to the query point (an upper limit of K is set in implementation). An alternative range query is K nearest neighbor (kNN) search which finds a fixed number of neighboring points. Compared with kNN, ball query’s local neighborhood guarantees a fixed region scale thus making local region feature more generalizable across space, which is preferred for tasks requiring local pattern recognition (e.g. semantic point labeling).
PointNet layer:利用pointnet网络捕捉上述局部分组的特征。

Robust Feature Learning under Non-Uniform Sampling Density

两种密度自适应层。
在这里插入图片描述

Multi-scale grouping (MSG):
As shown in Fig. 3 (a), a simple but effective way to capture multiscale patterns is to apply grouping layers with different scales followed by according PointNets to extract features of each scale. Features at different scales are concatenated to form a multi-scale feature.

Multi-resolution grouping (MRG):
In Fig. 3 (b), features of a region at some level Li is a concatenation of two vectors. One vector (left in figure) is obtained by summarizing the features at each subregion from the lower level Li−1 using the set abstraction level. The other vector (right) is the feature that is obtained by directly processing all raw points in the local region using a single PointNet.

Multi-resolution grouping (MRG)在计算量方面比Multi-scale grouping (MSG)更优。

Point Feature Propagation for Set Segmentation

要解决的问题:
In set abstraction layer, the original point set is subsampled. However in set segmentation task such as semantic point labeling, we want to obtain point features for all the original points.

解决思路:

We adopt a hierarchical propagation strategy with distance based interpolation and across level
skip links (as shown in Fig. 2).
interpolation:
在这里插入图片描述

across level skip links:
The interpolated features on Nl−1 points are then concatenated with skip linked point features from the set abstraction level. Then the concatenated features are passed through a “unit pointnet”, which is similar to one-by-one convolution in CNNs. A few shared fully connected and
ReLU layers are applied to update each point’s feature vector.

Pointnext

贡献点

Although the accuracy of PointNet++ has been largely surpassed by recent networks such as PointMLP and Point Transformer, we find that a large portion of the performance gain is due to improved training strategies, i.e. data augmentation and optimization techniques, and increased model sizes rather than architectural innovations. Thus, the full potential of PointNet++ has yet to be explored. In this work, we revisit the classical PointNet++ through a systematic study of model training and scaling strategies, and offer two major contributions.
First, we propose a set of improved training strategies that significantly improve PointNet++ performance.
Second, we introduce an inverted residual bottleneck design and separable MLPs into PointNet++ to enable efficient and effective model scaling and propose PointNeXt, the next version of PointNets.

网络架构

在这里插入图片描述

网络架构说明

Training Modernization

在这里插入图片描述

Data Augmentation

The original PointNet++ used simple combinations of data augmentations from random rotation, scaling, translation, and jittering for various benchmarks.

感觉这部分属于实验科学,需要控制变量进行尝试,文章中提供的四个结论有些也比较不符合常理。

Optimization Techniques

Optimization techniques including loss functions, optimizers, learning rate schedulers, and hyperparameters are also vital to the performance of a neural network.
PointNet++ uses the same optimization techniques throughout its experiments: CrossEntropy loss, Adam optimizer [16], exponential learning rate decay (Step Decay), and the same hyperparmeters. Owing to the development of machine learning theory, modern neural networks can be trained with theoretically better optimizers (e.g. AdamW [27] vs. Adam [16]) and more advanced loss functions (CrossEntropy with label smoothing[39]).
In general, CrossEntropy with label smoothing, AdamW, and Cosine Decay can decently optimize models in various tasks.

Architecture Modernization
Receptive Field Scaling

We study a different initial value in each benchmark and discover that the radius is dataset-specific and can have significant influence on performance.
relative position normalization

Model Scaling

增加更多SA模块或者使用更多通道,都不会显著提升准确率,反而会导致吞吐量的大幅下降,主要是由于梯度消失和过拟合造成的。那么如何进行模型缩放呢?
提出Inverted Residual MLP (InvResMLP):添加InvResMLP到SA模块后面,获得更深的网络。
如果不添加InvResMLP的话,在SA模块中增加一个MLP层,并在SA模块中添加残差连接。

Pointvector

贡献点

Pointnext存在的问题:These methods have recently focused on concise MLP structures, such as PointNeXt, which have demonstrated competitiveness with Convolutional and Transformer structures. However, standard MLPs are limited in their ability to extract local features effectively.
本文所做的改进:To address this limitation, we propose a Vector-oriented Point Set Abstraction that can aggregate neighboring features through higher-dimensional vectors. To facilitate network optimization, we construct a transformation from scalar to vector using independent angles based on 3D vector rotations. Finally, we develop a PointVector model that follows the structure of PointNeXt.

重点在于使用什么方式提高网络局部特征的提取能力。

Many point-based networks introduced novel and sophisticated modules to extract local features, e.g., attention-based methods [53] explore attention mechanisms as Fig.1a with lower consumption, convolution-based methods [36] explore the dynamic convolution kernel as Fig.1c, and graph-based methods [39][54] use graph to model relationships of points. The application of these methods to the feature extraction module of PointNet++ brings an improvement in feature quality. However, they are somewhat complicated to design in terms of network structure.

作者依然想进一步探索MLP在提高网络局部特征的提取能力方面的潜力。(原因:The MLP-like structure has recently shown the ability to rival the Transformer with simple architecture.)

The contributions are summarized below:

  • We propose a novel immediate vector representation with relative features and positions to better guide local feature aggregation.
  • We explore the method of obtaining vector representation and propose the generation method of 3D vector by utilizing the vector rotation matrix in 3D space.
  • Our proposed PointVector model achieves 72.3% mean Intersection over Union (mIOU) on S3DIS area5 and 78.4% mIOU on S3DIS (6-fold cross-validation) with only 58% model parameters of PointNeXt.

网络架构

在这里插入图片描述

网络架构说明

Vector-oriented Point Set Abstraction

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

大白话表述就是,每个局部邻域点特征维度为c,扩展为c3维,然后将所有局部邻域点的特征相加得到c3维特征,然后将3维向量投影为标量,获得c维特征,然后经过mlp之后与上层特征相加(跳跃连接)。

Extended Vector From Scalar

在这里插入图片描述

在这里插入图片描述

沿着某个坐标轴的3D向量,经过绕着xz轴的旋转之后,向量表示如上图。

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

如何从标量c维扩展为c3维?原有c维特征分量经过linear层获得zxj,前后补0获得3维向量。原有c维特征分量经过linear层、BN层与Relu层获得角度α和β,前述3D向量经过旋转变换后获得c3维向量表示。

低维向量转换为高维向量进行处理,提高局部特征提取能力。

Architecture

在这里插入图片描述
————————————————分割线————————————————————————

基于voxel的3D分割网络

Randla-Net

贡献点

处理大规模点云。

网络架构

网络架构说明


网站公告

今日签到

点亮在社区的每一天
去签到