Runtime requirements for LLM Compressor
The following are typical runtimes for each LLM Compressor algorithm based on runs using Meta-Llama-3-8B-Instruct on a NVIDIA A100 Tensor Core GPU.
| Algorithm | Estimated Time |
|---|---|
| RTN (QuantizationModifier) Weights only (no activation quant) | ~ 1 minutes |
| RTN (QuantizationModifier) Weights and activations | ~ 20 minutes |
| GPTQ (weights only) | ~ 30 minutes |
| AWQ (weights only) | ~ 30 minutes |