llmcompressor.modeling.moe_context
Simplified interface for MoE model calibration.
MoE (Mixture of Experts) models route tokens to different expert networks. During calibration for quantization/compression, we need to ensure ALL experts see data, not just the ones selected by the router. This module provides the infrastructure to temporarily modify MoE modules for proper calibration.
Key components: - MoECalibrationModule: Abstract base class for calibration modules - moe_calibration_context: Context manager that applies calibration to a model
Classes:
-
MoECalibrationModule–Abstract base class for MoE calibration modules.
Functions:
-
moe_calibration_context–Context manager that applies MoE calibration to a model.
MoECalibrationModule
Bases: ABC, Module, RegistryMixin
Abstract base class for MoE calibration modules.
Calibration modules replace original MoE modules during the calibration phase to ensure all experts receive data for proper quantization statistics.
Subclasses must: 1. Implement __init__() with signature: (self, original, config, calibrate_all_experts=True) 2. Set is_permanent to indicate if module should stay in calibration form 3. Optionally implement restore() if is_permanent=False
Methods:
-
restore–Restore the original module structure.
restore
Restore the original module structure.
Only needed if is_permanent=False. For permanent modules, this is a no-op.
Returns: The original module (or self if permanent)
Source code in llmcompressor/modeling/moe_context.py
moe_calibration_context
Context manager that applies MoE calibration to a model.
This scans all modules in the model and replaces any MoE modules with their calibration equivalents. After the context exits, non-permanent modules are restored to their original form.
The model is modified in-place, so the same model object should be used within the context.
Args: model: The model to apply MoE calibration to (modified in-place) calibrate_all_experts: If True, all experts see all tokens during calibration. If False, use normal routing (useful for some techniques)
Example: with moe_calibration_context(model): # Run calibration - all experts will see data for batch in dataloader: model(**batch) # Model is now restored (unless permanent)