Skip to content

Getting Started

This section will guide you through the process of installing the library, compressing your first model, and deploying it with vLLM for faster, more efficient inference.

LLM Compressor makes it simple to optimize large language models for deployment, offering various quantization techniques that help you find the perfect balance between model quality, performance, and resource efficiency.

Quick Start Guides

Follow the guides below to get started with LLM Compressor and optimize your models for production deployment.

  • Why LLM Compressor?


    Learn about the benefits of model optimization and how LLM Compressor helps reduce costs and improve performance.

    Why LLM Compressor

  • Installation


    Learn how to install LLM Compressor using pip or from source.

    Installation Guide

  • Compress Your Model


    Learn how to compress your model using different algorithms and formats with a step-by-step walkthrough

    Compression Guide

  • Deploy with vLLM


    Deploy your compressed model for efficient inference using vLLM.

    Deployment Guide

  • FAQ


    View the most frequently asked questions for LLM Compressor.

    FAQ