EIE: Efficient Inference Engine on Compressed Deep Neural Network
Introduction
- DRAM access costs more energy than basic operations
- CNN has a lot of data reuse, which makes DRAM access not very important
- Deep compression makes the kernel sparse, able to exist on mobile
Contribution
- accelerator for sparse and weight sharing neural networks
- accelerator that exploits the dynamic sparsity of activations to save computation
- a method of both distributed storage and distributed computation to parallelize a sparsified layer across multiple PEs (processing elements)
Motivation
- FC is used widely, and memory access is usually the bottle-neck ( no reuse of the input matrix)
- network compression offers a solution of memory bottleneck
- previous SPMV accelerator can only exploit static weight sparsity.
- batching of FC is not suitable for latency reason
Model
Variation of CSC
v : value
z : offset of last non-zero value (one column as a vector) or first element
q: pointer to the fist non-zero element of each column
combination between algorithm and hardware