0%

EIE, Efficient Inference Engine on Compressed Deep Neural Network

Posted on 2018-07-28 Edited on 2019-12-11 In paper Valine:

EIE: Efficient Inference Engine on Compressed Deep Neural Network

Introduction

DRAM access costs more energy than basic operations
CNN has a lot of data reuse, which makes DRAM access not very important
Deep compression makes the kernel sparse, able to exist on mobile

Contribution

accelerator for sparse and weight sharing neural networks
accelerator that exploits the dynamic sparsity of activations to save computation
a method of both distributed storage and distributed computation to parallelize a sparsified layer across multiple PEs (processing elements)

Motivation

FC is used widely, and memory access is usually the bottle-neck ( no reuse of the input matrix)
network compression offers a solution of memory bottleneck
previous SPMV accelerator can only exploit static weight sparsity.
batching of FC is not suitable for latency reason

Model

Variation of CSC

v : value

z : offset of last non-zero value (one column as a vector) or first element

q: pointer to the fist non-zero element of each column

combination between algorithm and hardware