SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks
Problem
Memory optimization of single DNN training work in single GPU
Method
- Liveness Analysis 只保存未来会用到的tensors
- Unified Tensor Pool communication tools for GPU-CPU memory using Least Recent Used tensor replacement policy and a Tensor cache. (只有conv 这种co mputation-intensive的才放到cpu去)
- Cost-Aware Re-computation 只要不高于peak memory 都不recompute,否则除了conv 都recompute(pool,activation…)
- Selective convolution algorithm 基于此刻与peak memory的差值选择convolution 方法(eg。 FFT需要额外的convolution space)