SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

Problem

Memory optimization of single DNN training work in single GPU

Liveness Analysis 只保存未来会用到的tensors
Unified Tensor Pool communication tools for GPU-CPU memory using Least Recent Used tensor replacement policy and a Tensor cache. （只有conv 这种co mputation-intensive的才放到cpu去）
Cost-Aware Re-computation 只要不高于peak memory 都不recompute，否则除了conv 都recompute（pool,activation…）
Selective convolution algorithm 基于此刻与peak memory的差值选择convolution 方法（eg。 FFT需要额外的convolution space）