kaggle1位の解析手法　「Cdiscountの画像分類チャレンジ」3 モデルの解説

前回記事の過去のkaggleコンペの「Cdiscountの画像分類チャレンジ」のデータ概要と環境準備を話しました。今回の記事はCdiscountの1位の解析モデル作成と解説します。

3.1 学習済みモデルの調整

Resnet34で実験を開始しました。

実験の結果：

1.ほとんどすべての学習済みモデルのネットワーク構造は、1000ラベルのイメージネット用ですが、今回のコンペは5270ラベルがあります。それを直接使用すると、ネットワークのボトルネックが発生します。

2. SGD(Stochastic Gradient Descent)よりADAM Optimizerはエポックの学習が速いと変わりました。

restnet34に1×1カーネルコンボリューションレイヤーを追加しました。チャネルが512から5270になり、FC(完全接続)が5270 * 5270になります。

Adamを追加ました。エポックを増やしたから、Learning rateを小さいくなります。

lr = 0.0003

if epoch > 7:

lr = 0.0001

if epoch > 9:

lr = 0.00005

if epoch > 11:

lr = 0.00001

180 * 180の画像からランダムした1画像のパッチで11.5エポックをトレーニングすると、パブリックのLeader boardでスコアが0.72を超えることになりました。

FCレイヤーを5270×5270ノードに追加し、結果が0.5以上改善されました。

1080Ti x 4 マシンは2.5時間で1エポックをトレーニングできるので、多くの実験を行うことができました。最初の1-2エポックで結果があまり良くない場合、結果は最後には良くなく、この実験を止まります。

試しましたが、次の段階では使用しませんでした実験：

1.Multi-level categories as multi-task.

2.Hard Example　Focal Loss

3.Dilation

4.Dropout

そして、restnet50を実験して、public LBのスコアは0.756になりました。

Resnet101,　resnet152, inceptionresnetv2 and inceptionV4を実験しました。Inceptionv4はこの段階では少し弱く、その他学習済みモデルはパブリックLBで0.755を超えるスコアに取得できます。

3.2 複数枚の画像データセットを利用

トレーニングデータセットは、1、2、3、4イメージの製品の4つの部分に分割されます。4イメージの画像データを微調整する場合、4イメージを1つのイメージに連結します。たとえば、3枚の画像のデータを微調整する場合は、3枚の画像を1枚の画像に連結します。

良くない画像を見つけました。モデルにとってはノイズになるため、最初の2〜3エポックで削除します。

このステップで、resnet50のモデルのクラスター、すべての画像でトレーニングされたモデル、1、2、3、4の画像製品の4つのモデルを得ました。0.77に近いaccスコアを取得できます。

いくつ回サブミットを行いました。resnet50モデルとinception-resnet-v2モデルを組み合わせれば、プライベートLBとパブリックLBで0.782 / 0.781を取得できることがわかりました。

3.4 OCRデータの追加

CDとBOOKを分類するのは非常に難しいことがわかりました。モデルが表紙を理解しないといけません。

OCTについて1週間以上かけて、CTPNを使用してテキストを含むボックスを抽出しましたが、結果は非常に良好でした。次に、CRNNを使用してボックスからテキストを抽出しました。しかし、テキストがあまり良くないことがわかりました。画像は小さく、CRNNは英語用にトレーニングされており、再トレーニングするためのフランス語のデータセットはありません。最後の手段は、CRNNから表面の特徴量を抽出し、マルチ入力CNNにフィードします。最初の入力はResnet50 FCネットワークとCRNNｍｐネットワークになりました。

過剰適しないために、複数回のサブミットを行いました。

モデルでOCRを使用すると、スコアは0.782になって、0.35％向上しました。

そのための方法：

densenet161、densenet169、dpn92を試してみましたが、resnet50よりもはるかに悪いので、それらをトレーニングしたため、モデルにdenseをついかして、アンサンブルしました。モデルをアンサンブルした後、パブリックLBは0.79に達します。

最終スコアを達成するために、それに応じて224/299サイズのイメージで1イメージ製品モデルを微調整し、さらに多様性を追加するために、VGGネットに続く2つの4096 * 4096 FCレイヤーにヘッドを変更し、そして、アンサンブルしました。

最終のモデルパイプラインは下記になります。

3.5 Restnetモデルのコード

下記のコードは一位のResnetのサンプルコードになります。Resnet34, Resnet50, Resnet101などの学習済みモデルを選択することができます。

resnet.py

https://github.com/bestfitting/kaggle/blob/master/cdiscount/resnet.py

#　ライブラリのインポート

import torch.nn as nn

import math

import torch.utils.model_zoo as model_zoo

import os

import torch.optim as optim

import torch

from torch.nn import DataParallel

__all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101',

           'resnet152']

import torch.nn.functional as F

#学習済みモデルのパス

model_urls = {

    'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',

    'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',

    'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',

    'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',

    'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',

}

def make_conv_bn_relu(in_channels, out_channels, kernel_size=3, stride=1, padding=1):

    return [

        nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, bias=False),

        nn.BatchNorm2d(out_channels),

        nn.ReLU(inplace=True),

    ]

# 3x3のネットワークの設定

def conv3x3(in_planes, out_planes, stride=1):

    "3x3 convolution with padding"

    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,

                     padding=1, bias=False)

#学習済みモデルの設定

def load_pretrain_pytorch_file(net,pytorch_file, skip=[], pretrained=True):

    pytorch_state_dict = torch.load(pytorch_file)

    if not pretrained:

        pytorch_state_dict=pytorch_state_dict['state_dict']

    if type(net) == DataParallel:

        state_dict = net.module.state_dict()

    else:

        state_dict = net.state_dict()

    for key  in pytorch_state_dict.keys():

        if key in skip or key not in state_dict.keys():

            # print('not in', key)

            continue

        if pytorch_state_dict[key].size() != state_dict[key].size():

            # print('size not the same', key)

            continue

        state_dict[key] = pytorch_state_dict[key]

    if type(net) == DataParallel:

        net.module.load_state_dict(state_dict)

    else:

        net.load_state_dict(state_dict)

#　ネットワークの設定

class BasicBlock(nn.Module):

    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):

        super(BasicBlock, self).__init__()

        self.conv1 = conv3x3(inplanes, planes, stride)

        self.bn1 = nn.BatchNorm2d(planes)

        self.relu = nn.ReLU(inplace=True)

        self.conv2 = conv3x3(planes, planes)

        self.bn2 = nn.BatchNorm2d(planes)

        self.downsample = downsample

        self.stride = stride

    def forward(self, x):

        residual = x

        out = self.conv1(x)

        out = self.bn1(out)

        out = self.relu(out)

        out = self.conv2(out)

        out = self.bn2(out)

        if self.downsample is not None:

            residual = self.downsample(x)

        out += residual

        out = self.relu(out)

        return out

#追加ネットワーク設定

class Bottleneck(nn.Module):

    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):

        super(Bottleneck, self).__init__()

        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)

        self.bn1 = nn.BatchNorm2d(planes)

        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,

                               padding=1, bias=False)

        self.bn2 = nn.BatchNorm2d(planes)

        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)

        self.bn3 = nn.BatchNorm2d(planes * 4)

        self.relu = nn.ReLU(inplace=True)

        self.downsample = downsample

        self.stride = stride

    def forward(self, x):

        residual = x

        out = self.conv1(x)

        out = self.bn1(out)

        out = self.relu(out)

        out = self.conv2(out)

        out = self.bn2(out)

        out = self.relu(out)

        out = self.conv3(out)

        out = self.bn3(out)

        if self.downsample is not None:

            residual = self.downsample(x)

        out += residual

        out = self.relu(out)

        return out

#Restnetの設定

class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes=1000):

        self.inplanes = 64

        super(ResNet, self).__init__()

        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,

                               bias=False)

        self.bn1 = nn.BatchNorm2d(64)

        self.relu = nn.ReLU(inplace=True)

        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.layer1 = self._make_layer(block, 64, layers[0])

        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)

        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)

        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)

        conv_out_channels=512 * block.expansion

        self.fc_in_features=conv_out_channels

        self.fc_in_features = num_classes

        self.layer5 = nn.Sequential(

                            *make_conv_bn_relu(conv_out_channels,self.fc_in_features,kernel_size=1, padding=0)

                       )

        self.avgpool = nn.AdaptiveAvgPool2d(1)

        self.fc_p1=nn.Linear(self.fc_in_features,self.fc_in_features)

        self.fc_p1_bn = nn.BatchNorm1d(self.fc_in_features)

        self.fc = nn.Linear(self.fc_in_features, num_classes)

        for m in self.modules():

            if isinstance(m, nn.Conv2d):

                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels

                m.weight.data.normal_(0, math.sqrt(2. / n))

            elif isinstance(m, nn.BatchNorm2d):

                m.weight.data.fill_(1)

                m.bias.data.zero_()

            elif isinstance(m, nn.Linear):

                n = m.weight.size(1)

                m.weight.data.normal_(0, 0.01)

                m.bias.data.zero_()

        self.pretrained_params=[self.conv1,self.bn1,self.layer1,self.layer2,self.layer3,self.layer4]

        self.new_params=[self.fc, self.layer5, self.fc_p1, self.fc_p1_bn]

    def forward(self, x):

        x = self.conv1(x)

        x = self.bn1(x)

        x = self.relu(x)

        x = self.maxpool(x)




        x = self.layer1(x)

        x = self.layer2(x)

        x = self.layer3(x)

        x = self.layer4(x)

        x_l3 = self.layer5(x)

        x_l3 = self.avgpool(x_l3)

        x_l3 = x_l3.view(x_l3.size(0), -1)

        x_l3 = self.fc_p1(x_l3)

        x_l3 = self.fc_p1_bn(x_l3)

        x_l3 = F.relu(x_l3)

        x_l3 = self.fc(x_l3)

        return x_l3

    def load_pretrain_pytorch_file(self, pytorch_file):

        skip=['fc.weight', 'fc.bias']

        load_pretrain_pytorch_file(self,pytorch_file, skip)

        print('load pretrained %s' % pytorch_file)

    def _make_layer(self, block, planes, blocks, stride=1):

        downsample = None

        if stride != 1 or self.inplanes != planes * block.expansion:

            downsample = nn.Sequential(

                nn.Conv2d(self.inplanes, planes * block.expansion,

                          kernel_size=1, stride=stride, bias=False),

                nn.BatchNorm2d(planes * block.expansion),

            )

        layers = []

        layers.append(block(self.inplanes, planes, stride, downsample))

        self.inplanes = planes * block.expansion

        for i in range(1, blocks):

            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

#全体のネットワークの設定

def resnet50_kxv3a(pretrained=False, **kwargs):

    model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)

    return model

if __name__ == "__main__":

    net = eval("resnet50_kxv3a")(pretrained=False, num_classes=5270)

    checkpoint = torch.load("model.pth")

    net.load_state_dict(checkpoint['state_dict'])

    print("succes")

kaggle1位の解析手法　「Cdiscountの画像分類チャレンジ」3 モデルの解説

目次

3.1 学習済みモデルの調整

3.2 複数枚の画像データセットを利用

3.4 OCRデータの追加

3.5 Restnetモデルのコード

resnet.py

2 thoughts on “kaggle1位の解析手法　「Cdiscountの画像分類チャレンジ」3 モデルの解説”

目次

3.1 学習済みモデルの調整

3.2 複数枚の画像データセットを利用

3.4 OCRデータの追加

3.5 Restnetモデルのコード

resnet.py

2 thoughts on “kaggle1位の解析手法 「Cdiscountの画像分類チャレンジ」3 モデルの解説”

2 thoughts on “kaggle1位の解析手法　「Cdiscountの画像分類チャレンジ」3 モデルの解説”