MaskFormer语义分割算法测试

发布于:2025-03-29 ⋅ 阅读:(23) ⋅ 点赞:(0)

MaskFormer是一套基于transformer结构的语义分割代码。

链接地址:

https://github.com/facebookresearch/MaskFormer/tree/main

测试用的数据集:ADE20k Dataset

MIT Scene Parsing Benchmark

 该数据集可通过上述链接下载,其中training含有20210张图片,validation含有2000张图片。SceneParsing中是全景分割的标签图片,InstanceSegmentation是实例分割的标签图片。

1.环境搭建

本人在python3.10,CUDA11.8,torch2.1.0的linux服务器上做实验。通过pip装好torch之后,然后按照INSTALL.md中的提示安装Detectron中的包。

有以下几点需要注意:

1.需要安装opencv-python-headless版本的opnecv

pip install opencv-python-headless

2.需要安装1.*版本的numpy

pip install numpy==1.26.0

3.使用timm加载模型的时候,会遇到某些层不支持的问题,在mask_former/modeling/backbone/swin.py中,修改为如下:

# from timm.models.layers import DropPath, to_2tuple, trunc_normal_
from timm.layers import DropPath, to_2tuple, trunc_normal_

4.安装panopticapi的包

git clone https://github.com/cocodataset/panopticapi.git
python setup.py build_ext --inplace
python setup.py build_ext install

个人配好的环境如下所示:

Package                 Version            Editable project location
----------------------- ------------------ ------------------------------------
absl-py                 2.2.1
antlr4-python3-runtime  4.9.3
black                   25.1.0
certifi                 2025.1.31
charset-normalizer      3.4.1
click                   8.1.8
cloudpickle             3.1.1
coloredlogs             15.0.1
contourpy               1.3.1
cycler                  0.12.1
Cython                  3.0.12
detectron2              0.6                /home/shengpeng/downloads/detectron2
filelock                3.18.0
flatbuffers             25.2.10
fonttools               4.56.0
fsspec                  2025.3.0
fvcore                  0.1.5.post20221221
grpcio                  1.71.0
h5py                    3.13.0
huggingface-hub         0.29.3
humanfriendly           10.0
hydra-core              1.3.2
idna                    3.10
iopath                  0.1.9
Jinja2                  3.1.6
kiwisolver              1.4.8
Markdown                3.7
markdown-it-py          3.0.0
MarkupSafe              3.0.2
matplotlib              3.10.1
mdurl                   0.1.2
mpmath                  1.3.0
mypy-extensions         1.0.0
networkx                3.4.2
numpy                   1.26.0
omegaconf               2.3.0
onnx                    1.17.0
onnx-simplifier         0.4.36
onnxruntime             1.21.0
opencv-python-headless  4.11.0.86
packaging               24.2
panopticapi             0.1
pathspec                0.12.1
pillow                  11.1.0
pip                     25.0
platformdirs            4.3.7
portalocker             3.1.1
protobuf                6.30.2
pycocotools             2.0.8
Pygments                2.19.1
pyparsing               3.2.3
python-dateutil         2.9.0.post0
PyYAML                  6.0.2
requests                2.32.3
rich                    13.9.4
safetensors             0.5.3
scipy                   1.15.2
setuptools              75.8.0
shapely                 2.0.7
six                     1.17.0
sympy                   1.13.3
tabulate                0.9.0
tensorboard             2.19.0
tensorboard-data-server 0.7.2
termcolor               2.5.0
timm                    1.0.15
tomli                   2.2.1
torch                   2.1.0+cu118
torchvision             0.16.0+cu118
tqdm                    4.67.1
triton                  2.1.0
typing_extensions       4.13.0
urllib3                 2.3.0
Werkzeug                3.1.3
wheel                   0.45.1
yacs                    0.1.8

下载预训练模型,即调用demo/demo.py,指定config的配置文件,和预训练权重,对图片进行推理,看预测效果。

python demo/demo.py \
--config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \
--input images/ADE/ADE_test_00000001.jpg \
--opts MODEL.WEIGHTS weights/MaskFormer_seg_R50_512x512.pkl

 训练的脚本:

python train_net.py \
--num-gpus 2 \
--config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \

在train_net.py中需要指定数据集的路径:

    os.environ['DETECTRON2_DATASETS']='/home/shengpeng/code/github_proj2/ADE2016/SceneParsing'

2张RTX3090的卡,大概跑了一晚上,结果如下:

其中最小模型,基于R50的backbone练出来的模型也有160多M。

2.torch模型转onnx

该套代码中没有带转onnx的代码,需要自己想办法转。

找到下载的detectron2的代码,detectron2/detectron2/engine/defaults.py中,重写class DefaultPredictor的__call__函数,如下:

    def __call__(self, original_image):
        with torch.no_grad(): 
            image = original_image[:, :, ::-1]
            input_blob = torch.as_tensor(image.astype("float32").transpose(2, 0, 1))
            input_blob = input_blob.unsqueeze(0)
            # print('self.cfg.MODEL.DEVICE:', self.cfg.MODEL.DEVICE)
            pixel_mean = self.cfg.MODEL.PIXEL_MEAN
            pixel_std = self.cfg.MODEL.PIXEL_STD
            pixel_mean = torch.Tensor(pixel_mean).view(-1, 1, 1)
            pixel_std = torch.Tensor(pixel_std).view(-1, 1, 1)
            
            input_blob = (input_blob-pixel_mean) / pixel_std
            input_blob = input_blob.to(self.cfg.MODEL.DEVICE)
            print('input_blob.shape:',input_blob.shape)
            predictions = self.model(input_blob)[0]
            return predictions

重写MaskFormer/maskformer/mask_former_model.py中的class MaskFormer的forward()的函数:

    def forward(self, input_blob):
        print('MaskFormer input_blob:', input_blob.shape)
        print('self.device:', self.device)
        print('input_blob.device:', input_blob.device)
        input_h, input_w = input_blob.shape[2], input_blob.shape[3]
        features = self.backbone(input_blob)
        outputs = self.sem_seg_head(features)

        if self.training:
            # # mask classification target
            # if "instances" in batched_inputs[0]:
            #     gt_instances = [x["instances"].to(self.device) for x in batched_inputs]
            #     targets = self.prepare_targets(gt_instances, images)
            # else:
            #     targets = None
            targets = None

            # bipartite matching-based loss
            losses = self.criterion(outputs, targets)

            for k in list(losses.keys()):
                if k in self.criterion.weight_dict:
                    losses[k] *= self.criterion.weight_dict[k]
                else:
                    # remove this loss if not specified in `weight_dict`
                    losses.pop(k)

            return losses
        else:
            mask_cls_results = outputs["pred_logits"]
            mask_pred_results = outputs["pred_masks"]
            # return mask_cls_results, mask_pred_results

            # upsample masks
            mask_pred_results = F.interpolate(
                mask_pred_results,
                size=(input_h, input_w),
                mode="bilinear",
                align_corners=False,
            )

            # mask_cls_result=mask_cls_results[0]
            # mask_pred_result=mask_pred_results[0]
            # print('mask_cls_result:',mask_cls_result.shape)
            # print('mask_pred_result:',mask_pred_result.shape)
            print('mask_cls_results:',mask_cls_results.shape)
            print('mask_pred_results:',mask_pred_results.shape)

            processed_results = []            
            if self.sem_seg_postprocess_before_inference:
                mask_pred_results = sem_seg_postprocess(
                    mask_pred_results, [input_h, input_w], input_h, input_w
                )

            # semantic segmentation inference
            r = self.semantic_inference(mask_cls_results, mask_pred_results)
            print(f'r1:{r.shape}')
            if not self.sem_seg_postprocess_before_inference:
                r = sem_seg_postprocess(r, [input_h, input_w], input_h, input_w)
            print(f'r2:{r.shape}')
            processed_results.append({"sem_seg": r})

            print('processed_results num:',len(processed_results))
            return processed_results

在tools中新建convert_torchvision_to_onnx.py的转模型脚本:

import argparse
import glob
import multiprocessing as mp
import os

# fmt: off
import sys
sys.path.insert(1, os.path.join(sys.path[0], '..'))
# fmt: on

import tempfile
import time
import warnings

import cv2
import numpy as np
import tqdm

from detectron2.config import get_cfg
from detectron2.data.detection_utils import read_image
from detectron2.projects.deeplab import add_deeplab_config
from detectron2.utils.logger import setup_logger

from mask_former import add_mask_former_config
from demo.predictor import VisualizationDemo

import onnx
import torch

def setup_cfg(args):
    # load config from file and command-line arguments
    cfg = get_cfg()
    add_deeplab_config(cfg)
    add_mask_former_config(cfg)
    cfg.merge_from_file(args.config_file)
    cfg.merge_from_list(args.opts)
    cfg.freeze()
    return cfg


def get_parser():
    parser = argparse.ArgumentParser(description="Detectron2 demo for builtin configs")
    parser.add_argument("--config-file", default="configs/ade20k-150/maskformer_R50_bs16_160k.yaml")
    parser.add_argument("--input", nargs="+")
    parser.add_argument(
        "--output", help="A file or directory to save output visualizations. "
        "If not given, will show output in an OpenCV window.")
    parser.add_argument(
        "--confidence-threshold", type=float, default=0.5, help="Minimum score for instance predictions to be shown")
    parser.add_argument(
        "--opts",
        help="Modify config options using the command-line 'KEY VALUE' pairs",
        default=['MODEL.WEIGHTS', 'output/model_0159999.pth'],
        nargs=argparse.REMAINDER,
    )
    return parser

if __name__ == "__main__":
    args = get_parser().parse_args()

    cfg = setup_cfg(args)

    demo = VisualizationDemo(cfg)

    net = demo.predictor.model
    net.to('cpu')

    input_model_path=cfg.MODEL.WEIGHTS
    print('input_model_path:%s' % (input_model_path))
    output_model_path=input_model_path.replace('.pth', '.onnx')

    im = torch.zeros(1, 3, 512, 512).to('cpu')  # image size(1, 3, 512, 512) BCHW
    input_layer_names   = ["images"]
    output_layer_names  = ["output"]
    dynamic = False
    
    # Export the model
    print(f'Starting export with onnx {onnx.__version__}.')
    torch.onnx.export(net,
        im,
        f               = output_model_path,
        verbose         = False,
        opset_version   = 12,
        training        = torch.onnx.TrainingMode.EVAL,
        do_constant_folding = True,
        input_names     = input_layer_names,
        output_names    = output_layer_names,
        dynamic_axes    = {'images': {0: 'batch'},'output': {0: 'batch'}} if dynamic else None)
    
            # Checks
    model_onnx = onnx.load(output_model_path)  # load onnx model
    onnx.checker.check_model(model_onnx)  # check onnx model

    # Simplify onnx
    simplify = 1
    if simplify:
        import onnxsim
        print(f'Simplifying with onnx-simplifier {onnxsim.__version__}.')
        # model_onnx, check = onnxsim.simplify(
        #     model_onnx,
        #     dynamic_input_shape=False,
        #     input_shapes=None)
        onnx_sim_model, check = onnxsim.simplify(model_onnx)
        assert check, 'assert check failed'
        onnx.save(model_onnx, output_model_path)

    print('Onnx model save as {}'.format(output_model_path))

即可转换成功得到对应的onnx模型,可使用onnxruntime加载该onnx模型做推理。

 

3.推理速度测试

在c++代码中,加载onnx转tensorrt测试速度,对比segformer中14M的模型,和该MaskFormer161M的模型,同时基于512x512的分辨率,转fp16的engine,做推理:

segfomer_b0      10ms左右

maskformer_R50     220ms左右

这个实验结果显示,该maskformer的模型不适用于那种速度要求特别高的场景,更适用于类别数较多,全景分割的场景。