Skip to content

llmcompressor.observers.base

Classes:

  • Observer

    Base class for observers which compute quantization parameters given observerations

Observer

Observer(
    base_name: str,
    args: QuantizationArgs,
    module: Optional[Module] = None,
    **observer_kwargs,
)

Bases: InternalModule, RegistryMixin

Base class for observers which compute quantization parameters given observerations of weights, activations, or attention states.

Example:

module = ...
observer = Observer.load_from_registry(observer, base_name="weight", args=...)
module.global_scale = observer.get_global_scale(module.weight)
scales, zero_points = observer(module.weight)

Parameters:

  • base_name

    (str) –

    str used to name the observer attribute

  • args

    (QuantizationArgs) –

    quantization args used to calibrate and quantize the observed value

  • module

    (Optional[Module], default: None ) –

    optional module with attached quantization parameters. This argument is required to utilize existing qparams such as global_scale or g_idx

  • **observer_kwargs

    keyword arguments for observer initialization

Methods:

  • forward

    Calculate updated scales and zero points from observed value

  • get_global_min_max

    Calculate min and max values from observed value for the purposes of

  • get_global_scale

    Calculate updated global scale from observed value

  • get_min_max

    Calculate min and max values from observed value

Source code in llmcompressor/observers/base.py
def __init__(
    self,
    base_name: str,
    args: QuantizationArgs,
    module: Optional[torch.nn.Module] = None,
    **observer_kwargs,
):
    super().__init__()
    self.module = ref(module) if module is not None else None
    self.base_name = base_name
    self.args = args

    # populate observer kwargs
    self.args.observer_kwargs = self.args.observer_kwargs or {}
    self.args.observer_kwargs.update(observer_kwargs)

forward

forward(observed: Tensor) -> ScaleZpTuple

Calculate updated scales and zero points from observed value (weight, activation, or attention state).

Parameters:

  • observed

    (Tensor) –

    value being observed

Returns:

  • ScaleZpTuple

    calibrated scale and zero point

Source code in llmcompressor/observers/base.py
@torch.no_grad
def forward(self, observed: torch.Tensor) -> ScaleZpTuple:
    """
    Calculate updated scales and zero points from observed value
    (weight, activation, or attention state).

    :param observed: value being observed
    :return: calibrated scale and zero point
    """
    scales, zero_points, _min, _max = self._forward_with_minmax(observed)
    return (scales, zero_points)

get_global_min_max abstractmethod

get_global_min_max(observed: Tensor) -> MinMaxTuple

Calculate min and max values from observed value for the purposes of global scale calculation

Parameters:

  • observed

    (Tensor) –

    value of shape (num_observations, 1, group_size)

Returns:

  • MinMaxTuple

    minimum value and maximum value whose shapes are (1, )

Source code in llmcompressor/observers/base.py
@abstractmethod
def get_global_min_max(self, observed: torch.Tensor) -> MinMaxTuple:
    """
    Calculate min and max values from observed value for the purposes of
    global scale calculation

    :param observed: value of shape (num_observations, 1, group_size)
    :return: minimum value and maximum value whose shapes are (1, )
    """
    raise NotImplementedError()

get_global_scale

get_global_scale(observed: Tensor) -> torch.Tensor

Calculate updated global scale from observed value (weight, activation, or attention state).

Parameters:

  • observed

    (Tensor) –

    value being observed

Returns:

  • Tensor

    calibrated global parameter

Source code in llmcompressor/observers/base.py
@torch.no_grad
def get_global_scale(self, observed: torch.Tensor) -> torch.Tensor:
    """
    Calculate updated global scale from observed value
    (weight, activation, or attention state).

    :param observed: value being observed
    :return: calibrated global parameter
    """
    global_scale, _min, _max = self._get_global_scale_with_minmax(observed)
    return global_scale

get_min_max abstractmethod

get_min_max(observed: Tensor) -> MinMaxTuple

Calculate min and max values from observed value

Parameters:

  • observed

    (Tensor) –

    value of shape (num_observations, *qparam_shape, group_size)

Returns:

  • MinMaxTuple

    minimum value and maximum value whose shapes are (*qparam_shape, )

Source code in llmcompressor/observers/base.py
@abstractmethod
def get_min_max(self, observed: torch.Tensor) -> MinMaxTuple:
    """
    Calculate min and max values from observed value

    :param observed: value of shape (num_observations, *qparam_shape, group_size)
    :return: minimum value and maximum value whose shapes are (*qparam_shape, )
    """
    raise NotImplementedError()