MXFP4 Quantization

vLLM currently supports MXFP4A16 quantization i.e weight-only quantization. Examples for this can be found here.

However, you can still generate MXFP4 models through LLM Compressor. These models have fully dynamic activations and this pathway has not yet been enabled for compressed-tensors models in vLLM.