参考文档
- https://github.com/Maratyszcza/NNPACK
- https://caffe2.ai/docs/mobile-integration.html#null__performance-considerations Caffe2官方文档对于NNPACK加速的描述
- gemm-based convolution 与 fft-based convolution比较?
- https://www.cc.gatech.edu/grads/m/mdukhan3/ NNPACK创建人网址
- BLAS for Deep Learning 讲述了NNPACK理论基础
- "Not so fast, FFT": Winograd Intel AI部门关于Neon框架加速的博客
- How to use NNPACK or EIGEN for conv process in C++ codes? 关于如何在Caffe2中开启NNPACK功能的issue
- Feature request: Support for Group convolution / Depthwise convolution
- http://mxnet-bing.readthedocs.io/en/latest/how_to/nnpack.html mxnet官方文档中讨论NNAPCK
NNPACK概念介绍
NNPACK是一个用于加速神经网络计算的软件包,NNPACK为多核CPU提供了高性能的卷积层实现。其实现理论是基于傅里叶变换和Winograd变换,下面列举了一下相关性能评估数据(测试平台-Core i7 6700K):
Library | Caffe | NNPACK | NNPACK | NNPACK |
---|---|---|---|---|
Algorithm | im2col + sgemm | FFT-8x8 | FFT-16x16 | Winograd F(6x6, 3x3) |
AlexNet:conv2 | 315 ms | 129 ms | 86ms | N/A |
AlexNet:conv3 | 182 ms | 87 ms | 44ms | 70 ms |
AlexNet:conv4 | 264 ms | 109 ms | 56ms | 89 ms |
AlexNet:conv5 | 177 ms | 77 ms | 40ms | 64 ms |
VGG-A:conv1 | 255ms | 303 ms | 260 ms | 404 ms |
VGG-A:conv2 | 902 ms | 369 ms | 267ms | 372 ms |
VGG-A:conv3.1 | 566 ms | 308 ms | 185ms | 279 ms |
VGG-A:conv3.2 | 1091 ms | 517 ms | 309ms | 463 ms |
VGG-A:conv4.1 | 432 ms | 228 ms | 149ms | 188 ms |
VGG-A:conv4.2 | 842 ms | 402 ms | 264ms | 329 ms |
VGG-A:conv5 | 292 ms | 141 ms | 83ms | 114 ms |
算法/卷积核尺寸 | 2 | 3 | 5 | 10 |
---|---|---|---|---|
nnpack | 6.69ms | 7.38ms | 9.71ms | 26.44ms |
im2col_sgemm | 37.83ms | 86.95ms | 236.91ms | 929.66ms |
下面将介绍一下如何在主流的深度学习框架中使用nnpack