python中shelf_使用PyTorch的ShelfNet进行快速准确的人体姿态估计

Fast and accurate Human Pose Estimation using ShelfNet with PyTorchThis repository is the result of my curiosity to find out whether ShelfNet is an efficient CNN architecture for computer vision tasks

高分子科学前沿

16661人浏览 · 2021-02-03 09:54:33

高分子科学前沿 · 2021-02-03 09:54:33 发布

Fast and accurate Human Pose Estimation using ShelfNet with PyTorch

This repository is the result of my curiosity to find out whether ShelfNet is an efficient CNN architecture for computer vision tasks other than semantic segmentation, and more specifically for the human pose estimation task. The answer is a clear yes, with 74.6 mAP and 127 FPS on the MS COCO Keypoints data set which represents a 3.5x boost in FPS compared to HRNet for a similar accuracy.

This repository includes:

Source code of ShelfNet modified from the authors' repository

Code to prepare the MS COCO keypoints dataset

Training and evaluation code for MS COCO keypoints modified from the HRNet authors' repository

Pre-trained weights for ShelfNet50

If you use it in your projects, please consider citing this repository (bibtex below).

ShelfNet Architecture Overview

The ShelfNet architecture was introduced by J. Zhuang, J. Yang, L. Gu and N. Dvornek through a paper available on arXiv. The paper evaluates the network only on the semantic segmentation task. The authors' contribution is to have created a fast architecture with a performance similar to the state of the art (PSPNet & EncNet at the time of publishing this repository) on PASCAL VOC and better performance on Cityscapes. Therefore, ShelfNet is presently one of the most suitable architectures for real-world applications with resource constraints.

As depicted above, ShelfNet uses a ResNet backbone combined with 2 encoder/decoder branches. The first encoder (in green?) reduces channel complexity by a factor 4 for faster inference speed. The S-block is a residual block with shared-weights to significantly reduce the number of parameters. The network uses strided convolutions for down-sampling and transpose convolutions for up-sampling. The structure can be seen as an ensemble of FCN where the information flows through many different paths, resulting in increased accuracy.

Results on Microsoft COCO KeyPoints

This section reports test results for ShelfNet50 on the famous MS COCO KeyPoints dataset, and makes a comparison with the state of the art HRNet. All experiments use the same person detector which has AP of 56.4 on COCO val2017 dataset. You can find the download link on the HRNet repository. A single Titan RTX with 24GB RAM was used for the ShelfNet50 experiments. The batch size is 128 for an input size of 256x192 and 72 for 384x288.

Architecture

Input size

Parameters

Memory size

FPS

pose_hrnet_w32

256x192

28.5M

0.744

0.798

931 MB

37.4

pose_hrnet_w32

384x288

28.5M

0.758

0.809

957 MB

37.6

pose_hrnet_w48

256x192

63.6M

0.751

0.804

1083 MB

37.7

pose_hrnet_w48

384x288

63.6M

0.763

0.812

1103 MB

36.7

-------------------------

-------------

---------

-------------

---------

shelfnet_50

256x192

38.7M

0.725

0.782

1013 MB

127.3

shelfnet_50

384x288

38.7M

0.746

0.797

1033 MB

127.7

Training on Your Own

I'm providing pre-trained weights for ShelfNet50 to make it easier to start. The test accuracies are obtained without providing the ground truth bounding boxes.

You can train and evaluate directly from the command line as such:

# Train ShelfNet on COCO

python train.py --cfg coco/shelfnet/shelfnet50_384x288_adam_lr1e-3.yaml

# Test ShelfNet on COCO

python test.py --cfg coco/shelfnet/shelfnet50_384x288_adam_lr1e-3.yaml TEST.MODEL_FILE ../output/coco/shelfnet/shelf_384x288_adam_lr1e-3/model_best.pth TEST.USE_GT_BBOX False

| Arch | AP | Ap .5 | AP .75| AP (M)| AP (L)| AR | AR .5 | AR .75| AR (M)| AR (L)|

|------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|

| shelfnet | 0.746 | 0.901 | 0.814 | 0.706 | 0.818 | 0.797 | 0.938 | 0.858 | 0.752 | 0.862 |

Requirements

Python 3.7, Torch 1.3.1 or greater, requests, tqdm, yacs, json_tricks, and pycocotools. Contrary to the ShelfNet repository, this repository is not based on torch-encoding.

Citation

Use this bibtex to cite this repository:

@misc{fmahoudeau_shelfnet_human_pose_2020,

title={ShelfNet for Human Pose Estimation},

author={Florent Mahoudeau},

year={2020},

publisher={GitHub},

journal={GitHub repository},

howpublished={\url{https://github.com/fmahoudeau/ShelfNet-Human-Pose-Estimation}},

}

魔乐社区

魔乐社区（Modelers.cn) 是一个中立、公益的人工智能社区，提供人工智能工具、模型、数据的托管、展示与应用协同服务，为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作，由全产业链共同建设、共同运营、共同享有，推动国产AI生态繁荣发展。

更多推荐

全家桶集齐！Qwen3.5四款小模型上线魔乐社区，附昇腾全套实践教程

魔乐社区

Pont - 搭建前后端之桥：高效、灵活的接口管理工具

Pont 是一款强大的数据服务层解决方案，它能够帮助开发者快速搭建前后端之间的桥梁，实现接口的高效管理和代码自动生成。无论是新手还是有经验的开发者，都能通过 Pont 轻松处理接口文档、生成类型安全的 API 代码，从而显著提升开发效率。[![Pont 工具标志](https://raw.gitcode.com/gh_mirrors/po/pont/raw/3f1b7d4bbba3fd2dda