llmcompressor.observers.mse

Classes:

MovingAverageMSEObserver –

Compute quantization parameters by finding the optimal min/max values which minimize

MemorylessMSEObserver

MemorylessMSEObserver(*args, **kwargs)

Bases: Observer

Compute quantization parameters by finding the optimal min/max values which minimize the mean of quantization error squared

mse_quant_error := mean((x - fake_quant(x))**2)
global_scale <- min[min_vals, max_vals, global_scale](mse_quant_error(x))
scale, zp <- min[min_vals, max_vals](mse_quant_error(x, global_scale))

Parameters:

base_name
–

str used to name the observer attribute
args
–

quantization args used to calibrate and quantize the observed value
module
–

optional module with attached quantization parameters. This argument is required to utilize existing qparams such as global_scale or g_idx
**observer_kwargs
–

keyword arguments for observer initialization

maxshrink: maximum shrink amount (in “grid steps”). The number of search steps is int(maxshrink * grid)

patience: number of consecutive search steps without improvement before early stopping

grid: resolution of the shrink search. Larger values give finer granularity in shrink factors

norm: exponent used when computing the error. norm = 2 approximates MSE

global_scale: precomputed global scale to use for quantization. Ignored if optimize_global_scale is True

optimize_global_scale: If True, recompute global_scale from the candidate min/max during each step of the search

Source code in llmcompressor/observers/mse.py

def __init__(self, *args, **kwargs):
    super().__init__(*args, **kwargs)
    observer_kwargs = self.args.observer_kwargs
    self.maxshrink = observer_kwargs.get("maxshrink", 0.20)
    self.patience = observer_kwargs.get("patience", 5)
    self.grid = observer_kwargs.get("grid", 100.0)
    self.norm = observer_kwargs.get("norm", 2.4)

MovingAverageMSEObserver

MovingAverageMSEObserver(*args, **kwargs)

Bases: MovingAverageObserverBase

Compute quantization parameters by finding the optimal min/max values which minimize the mean of quantization error squared.

mse_quant_error := mean((x - fake_quant(x))**2)
global_scale <- min[min_vals, max_vals, global_scale](mse_quant_error(x))
scale, zp <- min[min_vals, max_vals](mse_quant_error(x, global_scale))

Parameters:

base_name
–

str used to name the observer attribute
args
–

quantization args used to calibrate and quantize the observed value
module
–

optional module with attached quantization parameters. This argument is required to utilize existing qparams such as global_scale or g_idx
**observer_kwargs
–

keyword arguments for observer initialization

maxshrink: maximum shrink amount (in “grid steps”). The number of search steps is int(maxshrink * grid)

patience: number of consecutive search steps without improvement before early stopping

grid: resolution of the shrink search. Larger values give finer granularity in shrink factors

norm: exponent used when computing the error. norm = 2 approximates MSE

global_scale: precomputed global scale to use for quantization. Ignored if optimize_global_scale is True

optimize_global_scale: If True, recompute global_scale from the candidate min/max during each step of the search

Source code in llmcompressor/observers/mse.py

def __init__(self, *args, **kwargs):
    super().__init__(*args, **kwargs)
    observer_kwargs = self.args.observer_kwargs
    self.maxshrink = observer_kwargs.get("maxshrink", 0.20)
    self.patience = observer_kwargs.get("patience", 5)
    self.grid = observer_kwargs.get("grid", 100.0)
    self.norm = observer_kwargs.get("norm", 2.4)

llmcompressor.observers.mse

MemorylessMSEObserver

`base_name`

`args`

`module`

`observer_kwargs`**

MovingAverageMSEObserver

`base_name`

`args`

`module`

`observer_kwargs`**

llmcompressor.observers.mse

MemorylessMSEObserver

base_name

args

module

**observer_kwargs

MovingAverageMSEObserver

base_name

args

module

**observer_kwargs

`base_name`

`args`

`module`

`observer_kwargs`**

`base_name`

`args`

`module`

`observer_kwargs`**