Skip to content

Runtime requirements for LLM Compressor

The following are typical runtimes for each LLM Compressor algorithm based on runs using Meta-Llama-3-8B-Instruct on a NVIDIA A100 Tensor Core GPU.

Algorithm Estimated Time
RTN (QuantizationModifier)
Weights only (no activation quant)
~ 1 minutes
RTN (QuantizationModifier)
Weights and activations
~ 20 minutes
GPTQ (weights only) ~ 30 minutes
AWQ (weights only) ~ 30 minutes