Skip to content
LLM Compressor Docs
lifecycle
Initializing search
GitHub
LLM Compressor Docs
GitHub
Home
Why use LLM Compressor?
Compresssing your model, step-by-step
Compresssing your model, step-by-step
Choosing your model
Choosing the right compression scheme
Choosing the right compression algorithm
Choosing a dataset
Compressing your model
Deploying with vLLM
Getting started
Getting started
Installing LLM Compressor
Key Models
Key Models
Llama 4
Llama 4
FP8 Example
Qwen3
Qwen3
FP8 Example
Kimi-K2
Kimi-K2
FP8 Example
Mistral Large 3
Mistral Large 3
FP8 Example
Guides
Guides
Compression Schemes
Sequential Onloading
Model Loading
Distributed Oneshot
Saving a Model
Observers
Memory Requirements
Runtime Performance
Examples
Examples
AutoRound Quantization
AWQ Quantization
Big Model Quantization with Sequential Onloading
Disk offloading
Model-free Quantization
Multimodal Audio Model Quantization
Multimodal Vision-Language Quantization
KV Cache Quantization
Non-uniform Quantization
fp4 Quantization with NVFP4
int4 Weight Quantization
fp8 Weight and Activation Quantization
int8 Weight and Activation Quantization
Quantizing Mixture of Experts (MoE) models
2:4 Sparsity with FP8 Quantization
Applying Transforms to Improve Quantization Accuracy
Experimental
Experimental
Attention Quantization in LLM Compressor
Mistral-format model compression (experimental)
MXFP4 Quantization
Developer
Developer
LLM Compressor Code of Conduct
Contributing to LLM Compressor
API Reference
API Reference
llmcompressor
llmcompressor
logger
sentinel
args
args
dataset_arguments
model_arguments
recipe_arguments
utils
core
core
helpers
lifecycle
model_layer
session
session_functions
state
events
events
event
datasets
datasets
utils
entrypoints
entrypoints
oneshot
utils
model_free
model_free
helpers
lifecycle
microscale
model_utils
process
reindex_fused_weights
save_utils
validate
metrics
metrics
logger
utils
utils
frequency_manager
modeling
modeling
deepseek_v3
fuse
glm4_moe
gpt_oss
granite4
llama4
moe_context
qwen3_moe
qwen3_next_moe
qwen3_vl_moe
modifiers
modifiers
factory
interface
modifier
autoround
autoround
base
awq
awq
base
mappings
experimental
logarithmic_equalization
logarithmic_equalization
base
obcq
obcq
sgpt_base
pruning
pruning
helpers
constant
constant
base
magnitude
magnitude
base
sparsegpt
sparsegpt
base
sgpt_base
sgpt_sparsify
utils
utils
pytorch
pytorch
layer_mask
mask_factory
wanda
wanda
base
wanda_sparsify
quantization
quantization
calibration
group_size_validation
gptq
gptq
base
gptq_quantize
quantization
quantization
base
mixin
smoothquant
smoothquant
base
utils
transform
transform
quip
quip
base
smoothquant
smoothquant
base
utils
spinquant
spinquant
base
mappings
norm_mappings
utils
utils
constants
helpers
hooks
pytorch_helpers
observers
observers
base
helpers
min_max
moving_base
mse
pipelines
pipelines
cache
registry
basic
basic
pipeline
data_free
data_free
pipeline
independent
independent
pipeline
sequential
sequential
ast_helpers
helpers
pipeline
transformers_helpers
ast_utils
ast_utils
auto_wrapper
control_flow_analyzer
name_analyzer
pytorch
pytorch
model_load
model_load
helpers
utils
utils
helpers
sparsification
sparsification_info
sparsification_info
configs
helpers
module_sparsification_info
recipe
recipe
metadata
recipe
utils
transformers
transformers
compression
compression
compressed_tensors_utils
helpers
sparsity_metadata_config
data
data
base
c4
cnn_dailymail
custom
data_helpers
evolcodealpaca
flickr_30k
gsm8k
open_platypus
peoples_speech
ultrachat_200k
wikitext
tracing
tracing
debug
utils
utils
helpers
preprocessing_functions
utils
utils
dev
dist
helpers
metric_logging
transformers
pytorch
pytorch
module
utils
FAQ
Home
API Reference
llmcompressor
entrypoints
model_free
llmcompressor.entrypoints.model_free.lifecycle
Back to top