参考文档

  1. https://github.com/Maratyszcza/NNPACK
  2. https://caffe2.ai/docs/mobile-integration.html#null__performance-considerations Caffe2官方文档对于NNPACK加速的描述
  3. gemm-based convolution 与 fft-based convolution比较?
  4. https://www.cc.gatech.edu/grads/m/mdukhan3/ NNPACK创建人网址
  5. BLAS for Deep Learning 讲述了NNPACK理论基础
  6. "Not so fast, FFT": Winograd Intel AI部门关于Neon框架加速的博客
  7. How to use NNPACK or EIGEN for conv process in C++ codes? 关于如何在Caffe2中开启NNPACK功能的issue
  8. Feature request: Support for Group convolution / Depthwise convolution
  9. http://mxnet-bing.readthedocs.io/en/latest/how_to/nnpack.html mxnet官方文档中讨论NNAPCK

NNPACK概念介绍

NNPACK是一个用于加速神经网络计算的软件包,NNPACK为多核CPU提供了高性能的卷积层实现。其实现理论是基于傅里叶变换和Winograd变换,下面列举了一下相关性能评估数据(测试平台-Core i7 6700K):

Library Caffe NNPACK NNPACK NNPACK
Algorithm im2col + sgemm FFT-8x8 FFT-16x16 Winograd F(6x6, 3x3)
AlexNet:conv2 315 ms 129 ms 86ms N/A
AlexNet:conv3 182 ms 87 ms 44ms 70 ms
AlexNet:conv4 264 ms 109 ms 56ms 89 ms
AlexNet:conv5 177 ms 77 ms 40ms 64 ms
VGG-A:conv1 255ms 303 ms 260 ms 404 ms
VGG-A:conv2 902 ms 369 ms 267ms 372 ms
VGG-A:conv3.1 566 ms 308 ms 185ms 279 ms
VGG-A:conv3.2 1091 ms 517 ms 309ms 463 ms
VGG-A:conv4.1 432 ms 228 ms 149ms 188 ms
VGG-A:conv4.2 842 ms 402 ms 264ms 329 ms
VGG-A:conv5 292 ms 141 ms 83ms 114 ms

算法/卷积核尺寸 2 3 5 10
nnpack 6.69ms 7.38ms 9.71ms 26.44ms
im2col_sgemm 37.83ms 86.95ms 236.91ms 929.66ms

下面将介绍一下如何在主流的深度学习框架中使用nnpack

Caffe2

results matching ""

    No results matching ""