llmcompressor.observers.mse
Classes:
-
MovingAverageMSEObserver–Compute quantization parameters by finding the optimal min/max values which minimize
MemorylessMSEObserver
Bases: Observer
Compute quantization parameters by finding the optimal min/max values which minimize the mean of quantization error squared
mse_quant_error := mean((x - fake_quant(x))**2)
global_scale <- min[min_vals, max_vals, global_scale](mse_quant_error(x))
scale, zp <- min[min_vals, max_vals](mse_quant_error(x, global_scale))
Parameters:
-
–base_namestr used to name the observer attribute
-
–argsquantization args used to calibrate and quantize the observed value
-
–moduleoptional module with attached quantization parameters. This argument is required to utilize existing qparams such as global_scale or g_idx
-
–**observer_kwargskeyword arguments for observer initialization
maxshrink: maximum shrink amount (in “grid steps”). The number of search steps is int(maxshrink * grid)
patience: number of consecutive search steps without improvement before early stopping
grid: resolution of the shrink search. Larger values give finer granularity in shrink factors
norm: exponent used when computing the error. norm = 2 approximates MSE
global_scale: precomputed global scale to use for quantization. Ignored if
optimize_global_scaleis Trueoptimize_global_scale: If True, recompute
global_scalefrom the candidate min/max during each step of the search
Source code in llmcompressor/observers/mse.py
MovingAverageMSEObserver
Bases: MovingAverageObserverBase
Compute quantization parameters by finding the optimal min/max values which minimize the mean of quantization error squared.
mse_quant_error := mean((x - fake_quant(x))**2)
global_scale <- min[min_vals, max_vals, global_scale](mse_quant_error(x))
scale, zp <- min[min_vals, max_vals](mse_quant_error(x, global_scale))
Parameters:
-
–base_namestr used to name the observer attribute
-
–argsquantization args used to calibrate and quantize the observed value
-
–moduleoptional module with attached quantization parameters. This argument is required to utilize existing qparams such as global_scale or g_idx
-
–**observer_kwargskeyword arguments for observer initialization
maxshrink: maximum shrink amount (in “grid steps”). The number of search steps is int(maxshrink * grid)
patience: number of consecutive search steps without improvement before early stopping
grid: resolution of the shrink search. Larger values give finer granularity in shrink factors
norm: exponent used when computing the error. norm = 2 approximates MSE
global_scale: precomputed global scale to use for quantization. Ignored if
optimize_global_scaleis Trueoptimize_global_scale: If True, recompute
global_scalefrom the candidate min/max during each step of the search