こすたろーんエンジニアの試行錯誤部屋

作成物の備忘録を書いていきますー

Jetson Xavier NXとdockerでJenkinsの環境構築

jetson NXでJenkins環境を構築する備忘録です

目次

この記事でわかること

Jetson Xavier NXとdockerでJenkinsの環境構築方法

1.Dockerfileの作成

FROM jenkins/jenkins:2.361.4-jdk11
USER root
RUN apt-get update && apt-get install -y lsb-release
RUN curl -fsSLo /usr/share/keyrings/docker-archive-keyring.asc \
  https://download.docker.com/linux/debian/gpg
RUN echo "deb [arch=$(dpkg --print-architecture) \
  signed-by=/usr/share/keyrings/docker-archive-keyring.asc] \
  https://download.docker.com/linux/debian \
  $(lsb_release -cs) stable" > /etc/apt/sources.list.d/docker.list
RUN apt-get update && apt-get install -y docker-ce-cli
USER jenkins
RUN jenkins-plugin-cli --plugins "blueocean:1.25.8 docker-workflow:521.v1a_a_dd2073b_2e"

2.docker image作成

以下コマンドでdocker imageを作成

docker build -t myjenkins-blueocean:2.361.4-1 .

3. docker コンテナ起動

networkを生成して、コンテナを起動

sudo docker network create jenkins
sudo docker run --name [name] --restart=on-failure --detach --network jenkins --env DOCKER_HOST=tcp://docker:2376 --env DOCKER_CERT_PATH=/certs/client --env DOCKER_TLS_VERIFY=1 --publish [port]:[port] --publish [port]:[port] --volume jenkins-data:/var/jenkins_home --volume jenkins-docker-certs:/certs/client:ro myjenkins-blueocean:2.361.4-1

--name : 任意の名前を指定
--publish : ポートを指定 例--publish 8900:8900

これでjenkinsサーバが立ち上がります

4. Jenkinsの設定

3で設定したポート番号をつかってjenlkinsサーバへアクセスして初期設定を行う

http://localhost:port番号

初期設定は以下を参照
www.jenkins.io

5.参考

symfoware.blog.fc2.com

感想

今後はパイプライン機能を使ったフロー制御をやってみたいと思います

Jetson Xavier NXとdockerでmetaflowが使える環境を構築

MLOpsの勉強をしている中で管理モジュールのmetaflowが使える環境を構築しました
このブログは備忘録です

目次

この記事でわかること

Jetson Xavier NXとdockerでmetaflowの実行環境構築方法

1.Metaflowとは

Netflixが公開しているワークフロー管理モジュールです
metaflow.org

2.実行環境

Jetson Xavier NX
ubuntu18.04
docker
python3.x

3.Dockerfile作成

FROM nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.10-py3

ENV LC_ALL C.UTF-8
ENV LANG C.UTF-8

RUN pip3 install --upgrade pip
RUN pip3 install --ignore-installed PyYAML
RUN pip3 install metaflow

ARG USERNAME=your_name
ARG GROUPNAME=your_group_name
ARG UID=1000
ARG GID=1000
RUN groupadd -g $GID $GROUPNAME && \
    useradd -m -s /bin/bash -u $UID -g $GID $USERNAME
USER $USERNAME

4.docker imageの作成

以下コマンドでdocker imageを作成します

sudo docker build . -t tag_name

5.containerの作成

以下コマンドでコンテナーを作成し、立ち上げます

sudo docker run -it --rm --runtime nvidia --network host -v path/to/metaflow/workspace/:/workspace --name metaflow tag_name

6. sampleコードの実行

以下ページのサンプルコードを実行します

from metaflow import FlowSpec, step

class HelloFlow(FlowSpec):
    """
    A flow where Metaflow prints 'Hi'.
    Run this flow to validate that Metaflow is installed correctly.
    """
    @step
    def start(self):
        """
        This is the 'start' step. All flows must have a step named 'start' that
        is the first step in the flow.
        """
        print("HelloFlow is starting.")
        self.next(self.hello)

    @step
    def hello(self):
        """
        A step for metaflow to introduce itself.
        """
        print("Metaflow says: Hi!")
        self.next(self.end)

    @step
    def end(self):
        """
        This is the 'end' step. All flows must have an 'end' step, which is the
        last step in the flow.
        """
        print("HelloFlow is all done.")

if __name__ == '__main__':
    HelloFlow()

-> 実行後に.metaflowディレクトリが生成されていればOKです

7.参考

www.nogawanogawa.com

感想

とりあえず動かす環境ができたので、いろいろ試してみたいところです

barlowtwinsをcifar10データセットで動かしてみた

以前にJetson Xavier NXとpytorchでSimSiam表現学習(CIFAR-10)をやってみました
最新の表現学習を調べてみると、自己教師あり学習にbarlowtwinsなるものがあったので試してみました
Facebookが自己教師学習barlowtwinsのコードを公開していたのでこれを使っています

github.com

目次

この記事でわかること

barlowtwinsとcifar-10を使って表現学習を実行する方法

1.実行環境

Jetson Xavier NX
ubuntu18.04
docker
python3.x
pytorch
->pytochの環境構築に関しては以下でやってます。ご参考までに(^^♪

technoxs-stacker.hatenablog.com

2.変更後のコード

変更後のコードのまとめは以下になります

# -*- coding: utf-8 -*
# Copyright (c) Facebook, Inc. and its affiliates.
# All rights reserved.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
from pathlib import Path
import argparse
import json
import math
import os
import random
import signal
import subprocess
import sys
import time
from PIL import Image, ImageOps, ImageFilter
from torch import nn, optim
import torch
import torchvision
import torchvision.transforms as transforms
from torchinfo import summary
import torchvision.models as models

parser = argparse.ArgumentParser(description='Barlow Twins Training')
# parser.add_argument('data', type=Path, metavar='DIR',
#                     help='path to dataset')
parser.add_argument('--pretrained', '-p', default='path_to_pthfile', type=Path, metavar='FILE',
                    help='path to pretrained model')
parser.add_argument('--workers', default=1, type=int, metavar='N',
                    help='number of data loader workers')
parser.add_argument('--epochs', default=1000, type=int, metavar='N',
                    help='number of total epochs to run')
# parser.add_argument('--batch-size', default=2048, type=int, metavar='N',
#                     help='mini-batch size')
parser.add_argument('--batch-size', default=3, type=int, metavar='N',
                    help='mini-batch size')
parser.add_argument('--learning-rate-weights', default=0.2, type=float, metavar='LR',
                    help='base learning rate for weights')
parser.add_argument('--learning-rate-biases', default=0.0048, type=float, metavar='LR',
                    help='base learning rate for biases and batch norm parameters')
parser.add_argument('--weight-decay', default=1e-6, type=float, metavar='W',
                    help='weight decay')
parser.add_argument('--lambd', default=0.0051, type=float, metavar='L',
                    help='weight on off-diagonal terms')
# parser.add_argument('--projector', default='8192-8192-8192', type=str,
#                     metavar='MLP', help='projector MLP')
parser.add_argument('--projector', default='8192', type=str,
                    metavar='MLP', help='projector MLP')
parser.add_argument('--print-freq', default=100, type=int, metavar='N',
                    help='print frequency')
parser.add_argument('--checkpoint-dir', default='./checkpoint/', type=Path,
                    metavar='DIR', help='path to checkpoint directory')


def main():
    args = parser.parse_args()
    args.ngpus_per_node = torch.cuda.device_count()
    if 'SLURM_JOB_ID' in os.environ:
        # single-node and multi-node distributed training on SLURM cluster
        # requeue job on SLURM preemption
        signal.signal(signal.SIGUSR1, handle_sigusr1)
        signal.signal(signal.SIGTERM, handle_sigterm)
        # find a common host name on all nodes
        # assume scontrol returns hosts in the same order on all nodes
        cmd = 'scontrol show hostnames ' + os.getenv('SLURM_JOB_NODELIST')
        stdout = subprocess.check_output(cmd.split())
        host_name = stdout.decode().splitlines()[0]
        args.rank = int(os.getenv('SLURM_NODEID')) * args.ngpus_per_node
        args.world_size = int(os.getenv('SLURM_NNODES')) * args.ngpus_per_node
        args.dist_url = f'tcp://{host_name}:58472'
    else:
        # single-node distributed training
        args.rank = 0
        args.dist_url = 'tcp://localhost:58472'
        args.world_size = args.ngpus_per_node
    # torch.multiprocessing.spawn(main_worker, (args,), args.ngpus_per_node)
    main_worker(args.ngpus_per_node, args)


def main_worker(gpu, args):
    args.rank += gpu
    # torch.distributed.init_process_group(
    #     backend='nccl', init_method=args.dist_url,
    #     world_size=args.world_size, rank=args.rank)
    # torch.distributed.init_process_group(
    #     backend='gloo', init_method=args.dist_url,
    #     world_size=args.world_size, rank=args.rank)
    # if args.rank == 0:
    #     args.checkpoint_dir.mkdir(parents=True, exist_ok=True)
    #     stats_file = open(args.checkpoint_dir / 'stats.txt', 'a', buffering=1)
    #     print(' '.join(sys.argv))
    #     print(' '.join(sys.argv), file=stats_file)

    args.checkpoint_dir.mkdir(parents=True, exist_ok=True)
    stats_file = open(args.checkpoint_dir / 'stats.txt', 'a', buffering=1)
    print(' '.join(sys.argv))
    print(' '.join(sys.argv), file=stats_file)

    num_gpus = os.environ['CUDA_VISIBLE_DEVICES'].split(',').__len__()
    os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(f'{i}' for i in range(num_gpus))
    gpu = num_gpus - 1

    # torch.cuda.set_device(gpu)
    torch.cuda.set_device(gpu)
    torch.backends.cudnn.benchmark = True
    model = BarlowTwins(args).cuda(gpu)
    # model = nn.SyncBatchNorm.convert_sync_batchnorm(model)

    # summary(model)
    param_weights = []
    param_biases = []
    for param in model.parameters():
        if param.ndim == 1:
            param_biases.append(param)
        else:
            param_weights.append(param)
    
    parameters = [{'params': param_weights}, {'params': param_biases}]
    # model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[gpu])
    optimizer = LARS(parameters, lr=0, weight_decay=args.weight_decay,
                     weight_decay_filter=True,
                     lars_adaptation_filter=True)

    # automatically resume from checkpoint if it exists
    if (args.checkpoint_dir / 'checkpoint.pth').is_file():
        ckpt = torch.load(args.checkpoint_dir / 'checkpoint.pth',
                          map_location='cpu')
        start_epoch = ckpt['epoch']
        model.load_state_dict(ckpt['model'])
        optimizer.load_state_dict(ckpt['optimizer'])
    else:
        start_epoch = 0

    # dataset = torchvision.datasets.ImageFolder(args.data / 'train', Transform())
    dataset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                           download=False, transform=Transform())
    # sampler = torch.utils.data.distributed.DistributedSampler(dataset)
    # assert args.batch_size % args.world_size == 0
    # per_device_batch_size = args.batch_size // args.world_size
    # loader = torch.utils.data.DataLoader(
    #     dataset, batch_size=per_device_batch_size, num_workers=args.workers,
    #     pin_memory=True, sampler=sampler)
    
    
    assert args.batch_size % args.world_size == 0
    per_device_batch_size = args.batch_size // args.world_size
    # sampler = torch.utils.data.DataLoader(dataset)
    loader = torch.utils.data.DataLoader(
        dataset, batch_size=per_device_batch_size, num_workers=args.workers,
        pin_memory=True)
    
    start_time = time.time()
    scaler = torch.cuda.amp.GradScaler()
    for epoch in range(start_epoch, args.epochs):
        # sampler.set_epoch(epoch)
        for step, ((y1, y2), _) in enumerate(loader, start=epoch * len(loader)):
            y1 = y1.cuda(gpu, non_blocking=True)
            y2 = y2.cuda(gpu, non_blocking=True)
            adjust_learning_rate(args, optimizer, loader, step)
            optimizer.zero_grad()
            with torch.cuda.amp.autocast():
                loss = model.forward(y1, y2)
            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()
            if step % args.print_freq == 0:
                # if args.rank == 0:
                #     stats = dict(epoch=epoch, step=step,
                #                  lr_weights=optimizer.param_groups[0]['lr'],
                #                  lr_biases=optimizer.param_groups[1]['lr'],
                #                  loss=loss.item(),
                #                  time=int(time.time() - start_time))
                #     print(json.dumps(stats))
                #     print(json.dumps(stats), file=stats_file)
                stats = dict(epoch=epoch, step=step,
                                lr_weights=optimizer.param_groups[0]['lr'],
                                lr_biases=optimizer.param_groups[1]['lr'],
                                loss=loss.item(),
                                time=int(time.time() - start_time))
                print(json.dumps(stats))
                print(json.dumps(stats), file=stats_file)
        # if args.rank == 0:
        #     # save checkpoint
        #     state = dict(epoch=epoch + 1, model=model.state_dict(),
        #                  optimizer=optimizer.state_dict())
        #     torch.save(state, args.checkpoint_dir / 'checkpoint.pth')
        # save checkpoint
        state = dict(epoch=epoch + 1, model=model.state_dict(),
                        optimizer=optimizer.state_dict())
        torch.save(state, args.checkpoint_dir / 'checkpoint.pth')
    # if args.rank == 0:
    #     # save final model
    #     torch.save(model.module.backbone.state_dict(),
    #                args.checkpoint_dir / 'resnet50.pth')
    # save final model
    # import pdb; pdb.set_trace()
    # torch.save(model.module.backbone.state_dict(),
    #             args.checkpoint_dir / 'resnet50.pth')
    # bug fix https://discuss.pytorch.org/t/attributeerror-net-object-has-no-attribute-module/45652
    torch.save(model.backbone.state_dict(),
                args.checkpoint_dir / 'resnet50.pth')


def adjust_learning_rate(args, optimizer, loader, step):
    max_steps = args.epochs * len(loader)
    warmup_steps = 10 * len(loader)
    base_lr = args.batch_size / 256
    if step < warmup_steps:
        lr = base_lr * step / warmup_steps
    else:
        step -= warmup_steps
        max_steps -= warmup_steps
        q = 0.5 * (1 + math.cos(math.pi * step / max_steps))
        end_lr = base_lr * 0.001
        lr = base_lr * q + end_lr * (1 - q)
    optimizer.param_groups[0]['lr'] = lr * args.learning_rate_weights
    optimizer.param_groups[1]['lr'] = lr * args.learning_rate_biases


def handle_sigusr1(signum, frame):
    os.system(f'scontrol requeue {os.getenv("SLURM_JOB_ID")}')
    exit()


def handle_sigterm(signum, frame):
    pass


def off_diagonal(x):
    # return a flattened view of the off-diagonal elements of a square matrix
    n, m = x.shape
    assert n == m
    return x.flatten()[:-1].view(n - 1, n + 1)[:, 1:].flatten()


class BarlowTwins(nn.Module):
    def __init__(self, args):
        super().__init__()
        self.args = args
        # self.backbone = torchvision.models.resnet50(zero_init_residual=True)
        self.backbone = torchvision.models.resnet34(zero_init_residual=True)
        self.backbone.fc = nn.Identity()
        
        if os.path.isfile(args.pretrained):
             self.backbone.load_state_dict(torch.load(args.pretrained, map_location='cpu'))
        # self.backbone = models.vgg16(pretrained=False)

        # projector
        # sizes = [2048] + list(map(int, args.projector.split('-')))
        # sizes = [1000] + list(map(int, args.projector.split('-')))# vgg16-> bug?
        sizes = [512] + list(map(int, args.projector.split('-')))# resnet34
        layers = []
        for i in range(len(sizes) - 2):
            layers.append(nn.Linear(sizes[i], sizes[i + 1], bias=False))
            layers.append(nn.BatchNorm1d(sizes[i + 1]))
            layers.append(nn.ReLU(inplace=True))
        layers.append(nn.Linear(sizes[-2], sizes[-1], bias=False))
        self.projector = nn.Sequential(*layers)

        # normalization layer for the representations z1 and z2
        self.bn = nn.BatchNorm1d(sizes[-1], affine=False)

    def forward(self, y1, y2):
        z1 = self.projector(self.backbone(y1))
        z2 = self.projector(self.backbone(y2))

        # empirical cross-correlation matrix
        # c = self.bn(z1).T @ self.bn(z2)
        c = torch.mm(self.bn(z1).t(), self.bn(z2))

        # sum the cross-correlation matrix between all gpus
        c.div_(self.args.batch_size)
        # torch.distributed.all_reduce(c)

        on_diag = torch.diagonal(c).add_(-1).pow_(2).sum()
        off_diag = off_diagonal(c).pow_(2).sum()
        loss = on_diag + self.args.lambd * off_diag
        return loss


class LARS(optim.Optimizer):
    def __init__(self, params, lr, weight_decay=0, momentum=0.9, eta=0.001,
                 weight_decay_filter=False, lars_adaptation_filter=False):
        defaults = dict(lr=lr, weight_decay=weight_decay, momentum=momentum,
                        eta=eta, weight_decay_filter=weight_decay_filter,
                        lars_adaptation_filter=lars_adaptation_filter)
        super().__init__(params, defaults)


    def exclude_bias_and_norm(self, p):
        return p.ndim == 1

    @torch.no_grad()
    def step(self):
        for g in self.param_groups:
            for p in g['params']:
                dp = p.grad

                if dp is None:
                    continue

                if not g['weight_decay_filter'] or not self.exclude_bias_and_norm(p):
                    dp = dp.add(p, alpha=g['weight_decay'])

                if not g['lars_adaptation_filter'] or not self.exclude_bias_and_norm(p):
                    param_norm = torch.norm(p)
                    update_norm = torch.norm(dp)
                    one = torch.ones_like(param_norm)
                    q = torch.where(param_norm > 0.,
                                    torch.where(update_norm > 0,
                                                (g['eta'] * param_norm / update_norm), one), one)
                    dp = dp.mul(q)

                param_state = self.state[p]
                if 'mu' not in param_state:
                    param_state['mu'] = torch.zeros_like(p)
                mu = param_state['mu']
                mu.mul_(g['momentum']).add_(dp)

                p.add_(mu, alpha=-g['lr'])


class GaussianBlur(object):
    def __init__(self, p):
        self.p = p

    def __call__(self, img):
        if random.random() < self.p:
            sigma = random.random() * 1.9 + 0.1
            return img.filter(ImageFilter.GaussianBlur(sigma))
        else:
            return img


class Solarization(object):
    def __init__(self, p):
        self.p = p

    def __call__(self, img):
        if random.random() < self.p:
            return ImageOps.solarize(img)
        else:
            return img


class Transform:
    def __init__(self):
        self.transform = transforms.Compose([
            transforms.RandomResizedCrop(224, interpolation=Image.BICUBIC),
            transforms.RandomHorizontalFlip(p=0.5),
            transforms.RandomApply(
                [transforms.ColorJitter(brightness=0.4, contrast=0.4,
                                        saturation=0.2, hue=0.1)],
                p=0.8
            ),
            transforms.RandomGrayscale(p=0.2),
            GaussianBlur(p=1.0),
            Solarization(p=0.0),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
        ])
        self.transform_prime = transforms.Compose([
            transforms.RandomResizedCrop(224, interpolation=Image.BICUBIC),
            transforms.RandomHorizontalFlip(p=0.5),
            transforms.RandomApply(
                [transforms.ColorJitter(brightness=0.4, contrast=0.4,
                                        saturation=0.2, hue=0.1)],
                p=0.8
            ),
            transforms.RandomGrayscale(p=0.2),
            GaussianBlur(p=0.1),
            Solarization(p=0.2),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
        ])

    def __call__(self, x):
        y1 = self.transform(x)
        y2 = self.transform_prime(x)
        return y1, y2


if __name__ == '__main__':
    main()

3.実行

以下のコマンドで実行します

CUDA_VISIBLE_DEVICES=0 python3 cifar_main.py --epochs 100 --print-freq 10 --batch-size 5 --learning-rate-weights 0.2 --projector 1000 

コマンドオプションの説明

--epochs : epoch数を設定
--print-freq : コンソール出力周期を設定
--batch-size : バッチサイズを設定
--learning-rate-weights : learning rateを設定
--projector : projectorのサイズを設定

感想

projectorのサイズによってはうまく学習できないみたいです
使用するモデル、データセットとメモリサイズを考慮してprojector値の調整が必要かと思います

参考

github.com

simsiamとefficientnetで表現学習をやってみる

以前にJetson Xavier NXとdockerでpytorchのefficientnetの学習環境構築をしました
今回はこの環境を使ってefficientnetをベースとしたsimsiam表現学習をやってみます

目次

この記事でわかること

facebookresearchのsimsiamリポジトリを改造して、efficientnetで表現学習を行う方法

1.実行環境

Jetson Xavier NX
ubuntu18.04
docker
python3.6.9
pytorch 1.10
torchvision 0.11.0

2.環境構築手順

2.1 docker containerを起動

sudo docker run -it --rm --runtime nvidia -v path/to/workspace/directory --network host -p port_num:port_num nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.10-py3

-v path/to/workspace/directory : マウントするディレクトリを指定
-p port_num:port_num : port番号を指定

2.2 必要なモジュールをインストール

今回は以前に導入した引数解析モジュールabseilと可視化モジュールのtensorboardXを使うので、pipでインストール

pip3 install absl-py  
pip3 install tensorboardX  

2.3 simsiamコードをgit cloneする

以前にも使ったFacebookが公開している自己教師学習SimSiamのリポジトリを使います
github.com

git clone https://github.com/facebookresearch/simsiam.git

3.simsiamコードの改良

既存のコードそのままだとefficientnetが使えないので改造します

3.1 builder.pyの改造

モデル構築部分の処理を改造します

class SimSiam(nn.Module):
    """
    Build a SimSiam model.
    """
    def __init__(self, base_encoder, dim=2048, pred_dim=512):
        """
        dim: feature dimension (default: 2048)
        pred_dim: hidden dimension of the predictor (default: 512)
        """
        super(SimSiam, self).__init__()

        # create the encoder
        # num_classes is the output fc dimension, zero-initialize last BNs
        self.encoder = base_encoder(num_classes=dim, zero_init_residual=True)

        # build a 3-layer projector
        # 改造部分 ----------------------------------------------------------------------------------------------------------

        
        prev_in_features = self.encoder.classifier[1].in_features
        self.encoder.classifier[1] = nn.Sequential(nn.Linear(prev_in_features, prev_in_features, bias=False),
                                        nn.BatchNorm1d(prev_in_features),
                                        nn.ReLU(inplace=True), # first layer
                                        nn.Linear(prev_in_features, prev_in_features, bias=False),
                                        nn.BatchNorm1d(prev_in_features),
                                        nn.ReLU(inplace=True), # second layer
                                        self.encoder.classifier[1],
                                        nn.BatchNorm1d(dim, affine=False)) # output layer
        self.encoder.classifier[1][6].bias.requires_grad = False # hack: not use bias as it is followed by BN
        #  ----------------------------------------------------------------------------------------------------------

        # build a 2-layer predictor
        self.predictor = nn.Sequential(nn.Linear(dim, pred_dim, bias=False),
                                        nn.BatchNorm1d(pred_dim),
                                        nn.ReLU(inplace=True), # hidden layer
                                        nn.Linear(pred_dim, dim)) # output layer
        

3.2 main_simsiam.pyの改造

シングルプロセス対応と引数解析モジュールabseilと可視化モジュールのtensorboardXを導入するための改造です
詳細は以前の記事を参照してください

technoxs-stacker.hatenablog.com technoxs-stacker.hatenablog.com technoxs-stacker.hatenablog.com

import argparse
import math
import os
import random
import shutil
import time
import warnings
import torch
from torch.nn.functional import cosine_similarity
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.optim
import torch.multiprocessing as mp
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
import torchvision
import simsiam.loader
import simsiam.builder
from absl import app
from absl import flags
from tensorboardX import SummaryWriter

FLAGS = flags.FLAGS

model_names = sorted(name for name in models.__dict__
    if name.islower() and not name.startswith("__")
    and callable(models.__dict__[name]))

def _mkdirs(path):
    if not os.path.isdir(path):
        os.makedirs(path)

flags.DEFINE_string('arch', 'resnet50', 'model architecture.')
flags.DEFINE_integer('workers', 0, 'number of worker.')
flags.DEFINE_integer('epochs', 100, 'number of epochs.')
flags.DEFINE_integer('start_epoch', 0, 'manual epoch number (useful on restarts).')
flags.DEFINE_integer('batch_size', 8, 'mini-batch size (default:8).')
flags.DEFINE_float('lr', 0.05, 'initial (base) learning rate.')
flags.DEFINE_float('momentum', 0.9, 'momentum of SGD solver.')
flags.DEFINE_float('weight_decay', 1e-4, 'weight decay (default: 1e-4).')
flags.DEFINE_integer('print_freq', 10, 'print frequency (default: 10).')
flags.DEFINE_string('resume', './checkpoint/', 'path to latest checkpoint (default: none).')
flags.DEFINE_string('seed', None, 'seed for initializing training.')
flags.DEFINE_integer('gpu', 0, 'GPU id to use.')
flags.DEFINE_string('checkpoint_dir', './checkpoint', 'check point directory.')
# simsiam specific configs:
flags.DEFINE_integer('dim', 2048, 'feature dimension (default: 2048).')
flags.DEFINE_integer('pred_dim', 512, 'hidden dimension of the predictor (default: 512).')
flags.DEFINE_bool('fix_pred_lr', False, 'Fix learning rate for the predictor')
flags.DEFINE_string('tb_dir', 'tb_log', 'tensorboard log directory')


def main(argv):
    gpu = FLAGS.gpu
    seed = FLAGS.seed
    if seed is not None:
        random.seed(seed)
        torch.manual_seed(seed)
        cudnn.deterministic = True
        warnings.warn('You have chosen to seed training. '
                      'This will turn on the CUDNN deterministic setting, '
                      'which can slow down your training considerably! '
                      'You may see unexpected behavior when restarting '
                      'from checkpoints.')
    if gpu is not None:
        warnings.warn('You have chosen a specific GPU. This will completely '
                      'disable data parallelism.')
    main_worker(gpu)


def main_worker(gpu):
    arch = FLAGS.arch
    dim = FLAGS.dim
    pred_dim = FLAGS.pred_dim
    lr = FLAGS.lr
    batch_size = FLAGS.batch_size
    momentum = FLAGS.momentum
    weight_decay = FLAGS.weight_decay
    resume = FLAGS.resume
    start_epoch = FLAGS.start_epoch
    workers = FLAGS.workers
    epochs = FLAGS.epochs
    print_freq = FLAGS.print_freq
    checkpoint_dir = FLAGS.checkpoint_dir
    fix_pred_lr = FLAGS.fix_pred_lr
    log_dir = FLAGS.tb_dir

    _mkdirs(checkpoint_dir)
    _mkdirs(log_dir)

    # create model
    import pdb; pdb.set_trace()
    print("=> creating model '{}'".format(arch))
    model = simsiam.builder.SimSiam(
        models.__dict__[arch],
        dim, pred_dim)

    # infer learning rate before changing batch size
    init_lr = lr * batch_size / 256
    
    #import pdb; pdb.set_trace()
    num_gpus = os.environ['CUDA_VISIBLE_DEVICES'].split(',').__len__()
    os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(f'{i}' for i in range(num_gpus))

    torch.cuda.set_device(gpu)
    model = model.cuda(gpu)

    # define loss function (criterion) and optimizer
    criterion = nn.CosineSimilarity(dim=1).cuda(gpu)

    if fix_pred_lr:
        optim_params = [{'params': model.module.encoder.parameters(), 'fix_lr': False},
                        {'params': model.module.predictor.parameters(), 'fix_lr': True}]
    else:
        optim_params = model.parameters()

    optimizer = torch.optim.SGD(optim_params, init_lr,
                                momentum=momentum,
                                weight_decay=weight_decay)

    # optionally resume from a checkpoint
    if resume:
        if os.path.isfile(resume):
            print("=> loading checkpoint '{}'".format(resume))
            if gpu is None:
                checkpoint = torch.load(resume)
            else:
                # Map model to be loaded to specified single gpu.
                loc = 'cuda:{}'.format(gpu)
                checkpoint = torch.load(resume, map_location=loc)
            start_epoch = checkpoint['epoch']
            model.load_state_dict(checkpoint['state_dict'])
            optimizer.load_state_dict(checkpoint['optimizer'])
            print("=> loaded checkpoint '{}' (epoch {})"
                  .format(resume, checkpoint['epoch']))
        else:
            print("=> no checkpoint found at '{}'".format(resume))

    cudnn.benchmark = True

    # Data loading code
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])

    # MoCo v2's aug: similar to SimCLR https://arxiv.org/abs/2002.05709
    augmentation = [
        transforms.RandomResizedCrop(224, scale=(0.2, 1.)),
        transforms.RandomApply([
            transforms.ColorJitter(0.4, 0.4, 0.4, 0.1)  # not strengthened
        ], p=0.8),
        transforms.RandomGrayscale(p=0.2),
        transforms.RandomApply([simsiam.loader.GaussianBlur([.1, 2.])], p=0.5),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        normalize
    ]
    
    train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                           download=False, transform=simsiam.loader.TwoCropsTransform(transforms.Compose(augmentation)))

    train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=batch_size, num_workers=workers,
        pin_memory=True)

    # define writer
    writer = SummaryWriter(log_dir)

    for epoch in range(start_epoch, epochs):
        adjust_learning_rate(optimizer, init_lr, epoch, epochs)

        # train for one epoch
        train(train_loader, model, criterion, optimizer, epoch, gpu, print_freq, writer)

        save_checkpoint({
            'epoch': epoch + 1,
            'arch': arch,
            'state_dict': model.state_dict(),
            'optimizer' : optimizer.state_dict(),
        }, is_best=False, filename='checkpoint_{:04d}.pth.tar'.format(epoch))

    torch.save(model.state_dict(),
                checkpoint_dir / 'latest.pth')


def train(train_loader, model, criterion, optimizer, epoch, gpu, print_freq, writer):
    batch_time = AverageMeter('Time', ':6.3f')
    data_time = AverageMeter('Data', ':6.3f')
    losses = AverageMeter('Loss', ':.4f')
    progress = ProgressMeter(
        len(train_loader),
        [batch_time, data_time, losses],
        prefix="Epoch: [{}]".format(epoch))

    # switch to train mode
    model.train()
    end = time.time()
    for i, (images, _) in enumerate(train_loader, start=epoch * len(train_loader)):
        # measure data loading time
        data_time.update(time.time() - end)

        images[0] = images[0].cuda(gpu, non_blocking=True)
        images[1] = images[1].cuda(gpu, non_blocking=True)

        # compute output and loss
        p1, p2, z1, z2 = model(x1=images[0], x2=images[1])

        # compute output and loss
        loss = -(criterion(p1, z2).mean() + criterion(p2, z1).mean()) * 0.5
        losses.update(loss.item(), images[0].size(0))

        # compute gradient and do SGD step
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # measure elapsed time
        batch_time.update(time.time() - end)
        end = time.time()

        if i % print_freq == 0:
            progress.display(i)
        
        writer.add_scalar("train_loss", loss.item(), i)


class Transform:
    def __init__(self):
        self.transform = transforms.Compose([
            transforms.RandomResizedCrop(224, interpolation=Image.BICUBIC),
            transforms.RandomHorizontalFlip(p=0.5),
            transforms.RandomApply(
                [transforms.ColorJitter(brightness=0.4, contrast=0.4,
                                        saturation=0.2, hue=0.1)],
                p=0.8
            ),
            transforms.RandomGrayscale(p=0.2),
            GaussianBlur(p=1.0),
            Solarization(p=0.0),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
        ])
        self.transform_prime = transforms.Compose([
            transforms.RandomResizedCrop(224, interpolation=Image.BICUBIC),
            transforms.RandomHorizontalFlip(p=0.5),
            transforms.RandomApply(
                [transforms.ColorJitter(brightness=0.4, contrast=0.4,
                                        saturation=0.2, hue=0.1)],
                p=0.8
            ),
            transforms.RandomGrayscale(p=0.2),
            GaussianBlur(p=0.1),
            Solarization(p=0.2),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
        ])

    def __call__(self, x):
        y1 = self.transform(x)
        y2 = self.transform_prime(x)
        return y1, y2

def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    torch.save(state, filename)
    if is_best:
        shutil.copyfile(filename, 'model_best.pth.tar')

class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self, name, fmt=':f'):
        self.name = name
        self.fmt = fmt
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

    def __str__(self):
        fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
        return fmtstr.format(**self.__dict__)

class ProgressMeter(object):
    def __init__(self, num_batches, meters, prefix=""):
        self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
        self.meters = meters
        self.prefix = prefix

    def display(self, batch):
        entries = [self.prefix + self.batch_fmtstr.format(batch)]
        entries += [str(meter) for meter in self.meters]
        print('\t'.join(entries))

    def _get_batch_fmtstr(self, num_batches):
        num_digits = len(str(num_batches // 1))
        fmt = '{:' + str(num_digits) + 'd}'
        return '[' + fmt + '/' + fmt.format(num_batches) + ']'

def adjust_learning_rate(optimizer, init_lr, epoch, epochs):
    """Decay the learning rate based on schedule"""
    cur_lr = init_lr * 0.5 * (1. + math.cos(math.pi * epoch / epochs))
    for param_group in optimizer.param_groups:
        if 'fix_lr' in param_group and param_group['fix_lr']:
            param_group['lr'] = init_lr
        else:
            param_group['lr'] = cur_lr

if __name__ == '__main__':
    app.run(main)

3.3 configファイルの準備

実行時に読み込むconfigファイルを作成します

--arch=efficientnet_b0
--gpu=0
--batch_size=8
--print_freq=10
--pred_dim=256

--arch : 使用するベースモデル
--gpu : 使用するGPU
--batch_size : バッチサイズ
--print_freq : コンソール出力周期
--pred_dim : predictorの出力サイズ

4.実行

以下のコマンドで実行します

CUDA_VISIBLE_DEVICES=0 python3 main_simsiam_single.py --flagfile path/to/configfile/ --tb_dir path/to/log/directory/

CUDA_VISIBLE_DEVICES:使用するdeviceの番号
--flagfile : configファイルのパス
--tb_dir : tensorboradXの出力ディレクト

感想

今回は強引にコードを書き換えて対応させましたが、複数のモデルに対応できるようにしておきたいですね・・・

Jetson Xavier NXとdockerでpytorchのefficientnetの学習環境を構築してみた

以前にdockerを使ってJetson Xavier NX上でpytorchの環境を構築しました
technoxs-stacker.hatenablog.com

こちらの環境だとtorchvisionのversionは0.7.0となっています
efficientnetに対応しているtorchvisionのversionは0.11以上なので、torchvisionのversionを上げる必要があります

pytorch.org

今回はefficientnetに対応するための環境構築備忘録です

目次

この記事でわかること

Jetson Xavier NXとdockerでpytorchのefficientnetの学習環境構築方法

1.実行環境

Jetson Xavier NX
ubuntu18.04
docker
python3.x

2.環境構築手順

2.1.以下のコマンドでdocker imageを取得

sudo docker pull nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.10-py3

2.2.docker containerを起動

sudo docker run -it --rm --runtime nvidia -v path/to/workspace/directory --network host -p port_num:port_num nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.10-py3

-v path/to/workspace/directory : マウントするディレクトリを指定
-p port_num:port_num : port番号を指定

2.3.torchvisionのバージョンを確認する

python3 -c "import torchvision;print(torchvision.__version__);"

torchvisionのversionが0.11であればOKです

※コマンドは以下を参照してます
qiita.com

3.参考

catalog.ngc.nvidia.com

感想

pytorchのefficientnetの学習ができるようになったので、以前におこなったsimsiam表現学習のベースモデルを
efficientnetに対応させてみようかなと考え中です

simsiamの過去記事はこちらです
technoxs-stacker.hatenablog.com

how to use key-bind with streamlit

I was wondering if it would be possible to operate streamlit using the keyboard, so I looked into it and found that components.html might be able to do it, so here's a reminder.
The code is an improvement of the one I created before.
technoxs-stacker.hatenablog.com

contents

abstract

how to use key-bind in streamlit

1.requirement

Jetson Xavier NX
ubuntu18.04
docker
python3.x

2.code

Embed html using components.html
-> write a html to include a process for detecting keystrokes
-> The callback function is used to process keystrokes

import streamlit as st
import configparser
import argparse
import streamlit.components.v1 as components

def z_callback():
    st.write('z button!!!')

def a_callback():
    st.write('a button!!!')

z_col, right_col, _ = st.sidebar.columns([1, 1, 3])

with z_col:
    st.button('Z', on_click=z_callback)

with right_col:
    st.button('A', on_click=a_callback)

components.html(
    """
<script>
const doc = window.parent.document;
buttons = Array.from(doc.querySelectorAll('button[kind=primary]'));
const z_button = buttons.find(el => el.innerText === 'Z');
const a_button = buttons.find(el => el.innerText === 'A');
doc.addEventListener('keydown', function(e) {
    switch (e.keyCode) {
        case 90: // (90 = z)
            z_button.click();
            break;
        case 65: // (65 = a
            a_button.click();
            break;
    }
});
</script>
""",
    height=0,
    width=0,
)

def main(args):
    # arg parse
    config_file_path = args.config

    config_ini = configparser.ConfigParser()
    config_ini.read(config_file_path, encoding='utf-8')
    # parse config
    mode = config_ini['COMMON']['mode']
    st.markdown("# debug")
    st.write(mode)
    

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--config')
    args = parser.parse_args()
    main(args)

reference

github.com

https://web-designer.cman.jp/javascript_ref/keyboard/keycod

Command options for using configparser with streamlit

This is a memorandum of my trial for using configparser in streamlit.

contents

abstract

how to use configparser in streamlit

1.command opition

streamlit run st_config_test.py -- --config ./debug.ini

refarence

discuss.streamlit.io