ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
PDF:https://arxiv.org/abs/1606.02147.pdf
PyTorch: https://github.com/shanglianlm0525/PyTorch-Networks

1 概述

ENet是16年初的一篇工作了,能够达到实时的语义分割,包括在嵌入式设备NVIDIA TX1,同时还能够保证网络的效果。
在这里插入图片描述

2 Network architecture

2-1 ENet initial block

在这里插入图片描述
PyTorch代码:

class InitialBlock(nn.Module):
    def __init__(self,in_channels,out_channels):
        super(InitialBlock, self).__init__()

        self.conv = nn.Conv2d(in_channels, out_channels-in_channels, kernel_size=3, stride=2,padding=1, bias=False)
        self.pool = nn.MaxPool2d(kernel_size=3,stride=2,padding=1)

        self.bn = nn.BatchNorm2d(out_channels)
        self.relu = nn.PReLU()

    def forward(self, x):
        return self.relu(self.bn(torch.cat([self.conv(x),self.pool(x)],dim=1)))

2-2 ENet bottleneck module

下采样的bottleneck:
主线包括三个卷积层,

  • 先是2×2投影做降采样;
  • 然后是卷积(有三种可能,Conv普通卷积,asymmetric分解卷积,Dilated空洞卷积)
  • 后面再接一个1×1的做升维

注意每个卷积层后均接Batch Norm和PReLU。
辅线包括最大池化和Padding层

  • 最大池化负责提取上下文信息
  • Padding负责填充通道,达到后续的残差融合

融合后再接PReLU。

非下采样的bottleneck:
主线包括三个卷积层,

  • 先是1×1投影;
  • 然后是卷积(有三种可能,Conv普通卷积,asymmetric分解卷积,Dilated空洞卷积)
  • 后面再接一个1×1的做升维
    注意每个卷积层后均接Batch Norm和PReLU。
  • 辅线直接恒等映射(只有下采样才会增加通道数,故这里不需要padding层)
    融合后再接PReLU。
    在这里插入图片描述
    PyTorch代码:
class RegularBottleneck(nn.Module):
    def __init__(self,in_places,places, stride=1, expansion = 4,dilation=1,is_relu=False,asymmetric=False,p=0.01):
        super(RegularBottleneck, self).__init__()
        mid_channels = in_places // expansion
        self.bottleneck = nn.Sequential(
            Conv1x1BNReLU(in_places, mid_channels, False),
            AsymmetricConv(mid_channels, 1, is_relu) if asymmetric else Conv3x3BNReLU(mid_channels, mid_channels, 1,dilation, is_relu),
            Conv1x1BNReLU(mid_channels, places,is_relu),
            nn.Dropout2d(p=p)
        )
        self.relu = nn.ReLU(inplace=True) if is_relu else nn.PReLU()

    def forward(self, x):
        residual = x
        out = self.bottleneck(x)
        out += residual
        out = self.relu(out)
        return out


class DownBottleneck(nn.Module):
    def __init__(self,in_places,places, stride=2, expansion = 4,is_relu=False,p=0.01):
        super(DownBottleneck, self).__init__()
        mid_channels = in_places // expansion
        self.bottleneck = nn.Sequential(
            Conv2x2BNReLU(in_places, mid_channels, is_relu),
            Conv3x3BNReLU(mid_channels, mid_channels, 1, 1, is_relu),
            Conv1x1BNReLU(mid_channels, places,is_relu),
            nn.Dropout2d(p=p)
        )
        self.downsample = nn.MaxPool2d(3,stride=stride,padding=1,return_indices=True)
        self.relu = nn.ReLU(inplace=True) if is_relu else nn.PReLU()

    def forward(self, x):
        out = self.bottleneck(x)
        residual,indices = self.downsample(x)
        n, ch, h, w = out.size()
        ch_res = residual.size()[1]
        padding = torch.zeros(n, ch - ch_res, h, w)
        residual = torch.cat((residual, padding), 1)
        out += residual
        out = self.relu(out)
        return out, indices


class UpBottleneck(nn.Module):
    def __init__(self,in_places,places, stride=2, expansion = 4,is_relu=True,p=0.01):
        super(UpBottleneck, self).__init__()
        mid_channels = in_places // expansion

        self.bottleneck = nn.Sequential(
            Conv1x1BNReLU(in_places,mid_channels,is_relu),
            TransposeConv3x3BNReLU(mid_channels,mid_channels,stride,is_relu),
            Conv1x1BNReLU(mid_channels,places,is_relu),
            nn.Dropout2d(p=p)
        )
        self.upsample_conv = Conv1x1BN(in_places, places)
        self.upsample_unpool = nn.MaxUnpool2d(kernel_size=2)
        self.relu = nn.ReLU(inplace=True) if is_relu else nn.PReLU()

    def forward(self, x, indices):
        out = self.bottleneck(x)
        residual = self.upsample_conv(x)
        residual = self.upsample_unpool(residual,indices)
        out += residual
        out = self.relu(out)
        return out

2-3 ENet architecture

Stage 1: encoder阶段。包括5个bottleneck,第一个bottleneck做下采样,后面4个重复的bottleneck
Stage 2-3: encoder阶段。stage2的bottleneck2.0做了下采样,后面有时加空洞卷积,或分解卷积。stage3没有下采样,其他都一样。
Stage 4~5: 属于decoder阶段。比较简单,一个上采样配置两个普通的bottleneck模型架构在任何投影上都没有使用bias,这样可以减少内核调用和存储操作。在每个卷积操作中使用Batch Norm。encoder阶段是使用padding配合max pooling做下采样。在decoder时使用max unpooling配合空洞卷积完成上采样
在这里插入图片描述
PyTorch代码:

class ENet(nn.Module):
    def __init__(self, num_classes):
        super(ENet, self).__init__()

        self.initialBlock = InitialBlock(3,16)
        self.stage1_1 = DownBottleneck(16, 64, 2)
        self.stage1_2 = nn.Sequential(
            RegularBottleneck(64, 64, 1),
            RegularBottleneck(64, 64, 1),
            RegularBottleneck(64, 64, 1),
            RegularBottleneck(64, 64, 1),
        )

        self.stage2_1 = DownBottleneck(64, 128, 2)
        self.stage2_2 = nn.Sequential(
            RegularBottleneck(128, 128, 1),
            RegularBottleneck(128, 128, 1, dilation=2),
            RegularBottleneck(128, 128, 1, asymmetric=True),
            RegularBottleneck(128, 128, 1, dilation=4),
            RegularBottleneck(128, 128, 1),
            RegularBottleneck(128, 128, 1, dilation=8),
            RegularBottleneck(128, 128, 1, asymmetric=True),
            RegularBottleneck(128, 128, 1, dilation=16),
        )
        self.stage3 = nn.Sequential(
            RegularBottleneck(128, 128, 1),
            RegularBottleneck(128, 128, 1, dilation=2),
            RegularBottleneck(128, 128, 1, asymmetric=True),
            RegularBottleneck(128, 128, 1, dilation=4),
            RegularBottleneck(128, 128, 1),
            RegularBottleneck(128, 128, 1, dilation=8),
            RegularBottleneck(128, 128, 1, asymmetric=True),
            RegularBottleneck(128, 128, 1, dilation=16),
        )
        self.stage4_1 = UpBottleneck(128, 64, 2, is_relu=True)
        self.stage4_2 = nn.Sequential(
            RegularBottleneck(64, 64, 1, is_relu=True),
            RegularBottleneck(64, 64, 1, is_relu=True),
        )
        self.stage5_1 = UpBottleneck(64, 16, 2, is_relu=True)
        self.stage5_2 = RegularBottleneck(16, 16, 1, is_relu=True)

        self.final_conv = nn.ConvTranspose2d(in_channels=16, out_channels=num_classes, kernel_size=3, stride=2, padding=1,
                           output_padding=1, bias=False)

    def forward(self, x):
        x = self.initialBlock(x)
        x,indices1 = self.stage1_1(x)
        x = self.stage1_2(x)
        x, indices2 = self.stage2_1(x)
        x = self.stage2_2(x)
        x = self.stage3(x)
        x = self.stage4_1(x, indices2)
        x = self.stage4_2(x)
        x = self.stage5_1(x, indices1)
        x = self.stage5_2(x)
        out = self.final_conv(x)
        return out

3 Results

在这里插入图片描述

Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐