llmcompressor.observers.base
Classes:
-
Observer–Base class for observers which compute quantization parameters given observerations
Observer
Observer(
base_name: str,
args: QuantizationArgs,
module: Optional[Module] = None,
**observer_kwargs,
)
Bases: InternalModule, RegistryMixin
Base class for observers which compute quantization parameters given observerations of weights, activations, or attention states.
Example:
module = ...
observer = Observer.load_from_registry(observer, base_name="weight", args=...)
module.global_scale = observer.get_global_scale(module.weight)
scales, zero_points = observer(module.weight)
Parameters:
-
(base_namestr) –str used to name the observer attribute
-
(argsQuantizationArgs) –quantization args used to calibrate and quantize the observed value
-
(moduleOptional[Module], default:None) –optional module with attached quantization parameters. This argument is required to utilize existing qparams such as global_scale or g_idx
-
–**observer_kwargskeyword arguments for observer initialization
Methods:
-
forward–Calculate updated scales and zero points from observed value
-
get_global_min_max–Calculate min and max values from observed value for the purposes of
-
get_global_scale–Calculate updated global scale from observed value
-
get_min_max–Calculate min and max values from observed value
Source code in llmcompressor/observers/base.py
forward
Calculate updated scales and zero points from observed value (weight, activation, or attention state).
Parameters:
-
(observedTensor) –value being observed
Returns:
-
ScaleZpTuple–calibrated scale and zero point
Source code in llmcompressor/observers/base.py
get_global_min_max abstractmethod
Calculate min and max values from observed value for the purposes of global scale calculation
Parameters:
-
(observedTensor) –value of shape (num_observations, 1, group_size)
Returns:
-
MinMaxTuple–minimum value and maximum value whose shapes are (1, )
Source code in llmcompressor/observers/base.py
get_global_scale
Calculate updated global scale from observed value (weight, activation, or attention state).
Parameters:
-
(observedTensor) –value being observed
Returns:
-
Tensor–calibrated global parameter
Source code in llmcompressor/observers/base.py
get_min_max abstractmethod
Calculate min and max values from observed value
Parameters:
-
(observedTensor) –value of shape (num_observations, *qparam_shape, group_size)
Returns:
-
MinMaxTuple–minimum value and maximum value whose shapes are (*qparam_shape, )