llmcompressor.modeling
Model preparation and fusion utilities for compression workflows.
Provides tools for preparing models for compression including layer fusion, module preparation, and model structure optimization. Handles pre-compression transformations and architectural modifications needed for efficient compression.
Modules:
-
deepseek_v3– -
fuse– -
glm4_moe– -
gpt_oss– -
granite4– -
llama4– -
moe_context–Simplified interface for MoE model calibration.
-
qwen3_moe– -
qwen3_next_moe– -
qwen3_vl_moe–
Functions:
-
center_embeddings–Shift each embedding to have a mean of zero
-
fuse_norm_linears–Fuse the scaling operation of norm layer into subsequent linear layers.
center_embeddings
Shift each embedding to have a mean of zero
Parameters:
-
(embeddingModule) –embedding module containing embeddings to center
Source code in llmcompressor/modeling/fuse.py
fuse_norm_linears
Fuse the scaling operation of norm layer into subsequent linear layers. This useful for ensuring transform invariance between norm and linear layers.
Note that unitary transforms (rotation) commute with normalization, but not scaling
Parameters:
-
(normModule) –norm layer whose weight will be fused into subsequent linears
-
(linearsIterable[Linear]) –linear layers which directly follow the norm layer