0%

EIE, Efficient Inference Engine on Compressed Deep Neural Network

EIE: Efficient Inference Engine on Compressed Deep Neural Network

Introduction

  • DRAM access costs more energy than basic operations
  • CNN has a lot of data reuse, which makes DRAM access not very important
  • Deep compression makes the kernel sparse, able to exist on mobile

Contribution

  • accelerator for sparse and weight sharing neural networks
  • accelerator that exploits the dynamic sparsity of activations to save computation
  • a method of both distributed storage and distributed computation to parallelize a sparsified layer across multiple PEs (processing elements)

Motivation

  • FC is used widely, and memory access is usually the bottle-neck ( no reuse of the input matrix)
  • network compression offers a solution of memory bottleneck
  • previous SPMV accelerator can only exploit static weight sparsity.
  • batching of FC is not suitable for latency reason

Model

Variation of CSC

v : value

z : offset of last non-zero value (one column as a vector) or first element

q: pointer to the fist non-zero element of each column

combination between algorithm and hardware