参考文档

http://blog.csdn.net/SMF0504/article/details/78695908 深度学习训练中关于数据处理方式-原始样本采集以及数据增广
http://blog.csdn.net/u010555688/article/details/60757932 data augmentation 数据增强方法总结
https://zhuanlan.zhihu.com/p/23249000 海康威视研究院ImageNet2016竞赛经验分享
http://blog.csdn.net/u014297722/article/details/54601660 shell 数组、字典、source、split简单实例
http://blog.csdn.net/victoriaw/article/details/53863565 caffe2——图片均值文件
http://www.cnblogs.com/denny402/p/5102328.html 计算图片数据的均值
https://www.zhihu.com/question/32673260 深度机器学习中的batch的大小对学习效果有何影响？
http://blog.csdn.net/zilanpotou182/article/details/76165241 深度学习中Batch_size相关问题汇总
http://blog.csdn.net/sweet0heart/article/details/53042390 Caffe代码解读(四):solver_param

Caffe的Depthwise高效实现

训练数据集制作

进行数据增强

生成train.txt和val.txt

本次训练工作根据label.txt和固定的文件夹存放顺序来生成train.txt

label.txt的格式为

分类标签,年款,型号,品牌,厂商,车辆子类别,车辆大类别,方向

训练集文件夹的存放顺序为

方向/车辆大类别/车辆子类别/厂商/品牌/型号/年款

根据方向+品牌+型号+年款即可唯一确定一张训练图片对应的分类标签

#!/bin/bash

declare -A labelDictionary
declare -A sampleNumberDictionary
declare -A validationNumberDictionary

# 读取所有图片文件，并保存文件列表到文本文件中
rm train.txt -rf
rm val.txt -rf
find . -name *.jpg | cut -d '/' -f2- >> train.txt

# 读取labels标签到字典中
while read line
do 
    OLD_IFS="$IFS"
    IFS=","
    vehicleInfo=($line)
    IFS="$OLD_IFS"
    vehicleIdentifier=${vehicleInfo[7]}_${vehicleInfo[3]}_${vehicleInfo[2]}_${vehicleInfo[1]}
    labelDictionary+=([$vehicleIdentifier]=${vehicleInfo[0]})
    sampleNumberDictionary+=([$vehicleIdentifier]=0)
    validationNumberDictionary+=([$vehicleIdentifier]=0)
done < labels.txt

# 统计各个样本的数量
while read line
do 
    OLD_IFS="$IFS"
    IFS="/"
    vehicleInfo=($line)
    IFS="$OLD_IFS"
    vehicleIdentifier=${vehicleInfo[1]}_${vehicleInfo[5]}_${vehicleInfo[6]}_${vehicleInfo[7]}
    sampleNumberDictionary[$vehicleIdentifier]=$((${sampleNumberDictionary[$vehicleIdentifier]}+1))
done < train.txt

# 添加train.txt中的分类标签内容
touch train_tmp.txt
while read line
do 
    OLD_IFS="$IFS"
    IFS="/"
    vehicleInfo=($line)
    IFS="$OLD_IFS"
    vehicleIdentifier=${vehicleInfo[1]}_${vehicleInfo[5]}_${vehicleInfo[6]}_${vehicleInfo[7]}
    space=" "

    validationMaxNumber=$(((${sampleNumberDictionary[$vehicleIdentifier]}+3)/6))
    if [ $(($RANDOM%96)) -lt 32 ] 
        && [ $((${validationNumberDictionary[$vehicleIdentifier]})) -lt $(($validationMaxNumber)) ]
    then
        echo ${line}${space}${labelDictionary[$vehicleIdentifier]} >> val_tmp.txt
        validationNumberDictionary[$vehicleIdentifier]=$((${validationNumberDictionary[$vehicleIdentifier]}+1))
    else
        echo ${line}${space}${labelDictionary[$vehicleIdentifier]} >> train_tmp.txt
    fi
done < train.txt 

rm train.txt -rf
mv train_tmp.txt train.txt
mv val_tmp.txt val.txt

生成lmdb图片数据文件

#!/bin/bash

生成图片均值文件

cd /opt/xyl/caffe
sudo ./build/tools/compute_image_mean 
    /media/hwzt/本地磁盘1/caffe/examples/vehicleBrandClassification/vehicleBrandClassification_train_lmdb 
    mean.binaryproto
# 注意：这里的mean.binaryproto只能填写文件名，不能添加路径。

# lmdb_gt200均值数据记录
mean_value channel [0]: 92.5277
mean_value channel [1]: 91.0916
mean_value channel [2]: 89.3041

模型描述文件

训练超参数

下面将列举几个比较重要的超参数，并在之后针对性地进行解释：

type：优化算法
base_lr：网络基础学习速率
lr_policy：学习速率的衰减策略
weight_decay：权重衰减
iter_size：实际训练时的batch_size=batch_size*iter_size

mobilenet_v1训练参数

查看shicai的mobilenet-caffe的issue#1中关于训练参数的内容提到：mobilenet的训练参数和resnet的参数类似，并给出了https://github.com/facebook/fb.resnet.torch，从中进一步查到如下设置：

th main.lua -depth 50 -batchSize 256 -nGPU 4 -nThreads 8 -shareGradInput true -data [imagenet-folder]

---------- Optimization options ----------------------
   cmd:option('-LR',              0.1,   'initial learning rate')
   cmd:option('-momentum',        0.9,   'momentum')
   cmd:option('-weightDecay',     1e-4,  'weight decay')

mobilenet论文3.2节中提到了训练时使用了RMSprop优化算法

https://github.com/Zehaos/MobileNet中显示的ImageNet训练参数为：

tf.app.flags.DEFINE_string('optimizer', 'rmsprop')
tf.app.flags.DEFINE_float('weight_decay', 0.00004, 'The weight decay on the model weights.')
tf.app.flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')

https://github.com/Zehaos/MobileNet/issues/13 提到训练Mobilenet建议使用大的batchsize，batchsize应该大于128。另外建议使用小的weight_decay。并提到了数据输入的scale: 0.017

https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1_train.py 中显示的ImageNet训练参数为：

flags.DEFINE_integer('batch_size', 64, 'Batch size')
def get_learning_rate():
  if FLAGS.fine_tune_checkpoint:
    # If we are fine tuning a checkpoint we need to start at a lower learning
    # rate since we are farther along on training.
    return 1e-4
  else:
    return 0.045

python train_image_classifier.py \
  --max_number_of_steps=1000000 \
  --batch_size=64 \
  --optimizer=rmsprop \
  --rmsprop_decay=0.9 \
  --opt_epsilon=1.0\
  --learning_rate=0.1 \
  --learning_rate_decay_factor=0.1 \
  --momentum=0.9 \
  --num_epochs_per_decay=30.0 \
  --weight_decay=0.0 \

http://blog.csdn.net/qq_25220145/article/details/71436512 中mobilenet的训练数据

batch_size = 128
num_classes = 2626
opt = keras.optimizers.rmsprop(lr=0.0004, decay=1e-6)

mobilenet_v2训练参数

https://github.com/suzhenghang/MobileNetv2

Training details for ImageNet2012 : type: "SGD" lr_policy: "poly" base_lr: 0.045 power: 1 momentum: 0.9 weight_decay: 0.00004

https://github.com/austingg/MobileNet-v2-caffe

Don't forget set the weight decay 4e-5

https://github.com/ShuangXieIrene/mobilenet-v2

The model was trained with SGD+Momentum optimizer.

https://github.com/xiaochus/MobileNetV2

device: Tesla K80
dataset: cifar-100
optimizer: Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08)  
batch_szie: 128

https://github.com/miraclewkf/MobileNetV2-PyTorch

python train.py --batch-size 256 --gpus 0,1,2,3
parser.add_argument('--lr', type=float, default=0.045)
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model.parameters(), lr=args.lr, momentum=0.9, weight_decay=0.00004)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=1, gamma=0.98)

https://www.ctolib.com/tonylins-pytorch-mobilenet-v2.html

I tried to train the model with RMSprop from scratch as described in the paper, but it does not seem to work.I am currently training the model with SGD and keeping other hyper-parameters the same (except that I use batch size 256)

https://discuss.gluon.ai/t/topic/5295 关于MobileNet V2的实现和训练

MobileNet v2 1.0最好的top 1 accuracy只有63.1，和论文中报的71.6差太多。

https://github.com/tensor-yu/cascaded_mobilenet-v2

http://blog.csdn.net/u011995719/article/details/79435615

base_lr:  0.001
momentum: 0.9
weight_decay: 0.0004

type: "Adam"

lr_policy: "multistep"
#gamma: 0.9
gamma:0.1
stepvalue: 80000  
stepvalue: 100000

进行训练

由于使用Caffe进行Mobilenet_V1/V2的训练时出现了不收敛的情况。

已经基本确定Caffe在GPU模式下进行训练会不正常，CPU模式正常。很有可能因为CUDA9和CUDNN9导致。

已经切换到MXNet进行模型的训练工作

嵌入式深度学习框架之Caffe(四)车辆品牌分类模型训练