llmcompressor.modeling.qwen3_vl_moe

Classes:

CalibrateQwen3VLMoeTextSparseMoeBlock –

Calibration version of Qwen3VLMoeTextSparseMoeBlock that sends all tokens to all

CalibrateQwen3VLMoeTextSparseMoeBlock

CalibrateQwen3VLMoeTextSparseMoeBlock(
    original: Qwen3VLMoeTextSparseMoeBlock,
    config: Qwen3VLMoeConfig,
    calibrate_all_experts: bool,
)

Bases: MoECalibrationModule

Calibration version of Qwen3VLMoeTextSparseMoeBlock that sends all tokens to all experts.

Source code in llmcompressor/modeling/qwen3_vl_moe.py

def __init__(
    self,
    original: "Qwen3VLMoeTextSparseMoeBlock",
    config: "Qwen3VLMoeConfig",
    calibrate_all_experts: bool,
):
    super().__init__()
    text_config: "Qwen3VLMoeTextConfig" = config.get_text_config()

    self.hidden_size = text_config.hidden_size
    self.num_experts = text_config.num_experts
    self.top_k = original.top_k
    # Note: gate was changed to be a Linear layer in transformers==4.57.0
    # https://github.com/JJJYmmm/transformers/commit/f5dea1c694af8c994c769170813a8702332119ee
    self.gate = original.gate
    self.calibrate_all_experts = calibrate_all_experts
    self.experts = SequentialQwen3VLMoeTextExperts(text_config, original.experts)