llmcompressor.modeling.llama4
Classes:
-
SequentialLlama4TextMoe–Calibration version of Llama4TextMoe that unpacks experts for sequential processing.
SequentialLlama4TextMoe
SequentialLlama4TextMoe(
original: Llama4TextMoe,
config: Llama4Config,
calibrate_all_experts: bool = True,
)
Bases: MoECalibrationModule
Calibration version of Llama4TextMoe that unpacks experts for sequential processing.
This module: 1. Unpacks the packed expert weights (3D -> 2D) for calibration 2. Optionally sends all tokens to all experts during calibration 3. Stays in unpacked form (permanent) for vLLM compatibility