Skip to content

llmcompressor.utils.metric_logging

Utility functions for metrics logging and GPU memory monitoring.

This module provides functions for tracking GPU memory usage, measuring model layer sizes, and comprehensive logging during compression workflows. Supports both NVIDIA and AMD GPU monitoring with detailed memory statistics and performance metrics.

Classes:

CompressionLogger

CompressionLogger(module: Module)

Log metrics related to compression algorithm

Parameters:

  • start_tick

    time when algorithm started

  • losses

    loss as result of algorithm

  • gpu_type

    device manufacturer (e.g. Nvidia, AMD)

  • visible_ids

    list of device ids visible to current process

Source code in llmcompressor/utils/metric_logging.py
def __init__(self, module: torch.nn.Module):
    self.module = module
    self.start_tick = None
    self.loss = None
    self.gpu_type = GPUType.amd if torch.version.hip else GPUType.nv

    # Parse appropriate env var for visible devices to monitor
    # If env var is unset, default to all devices
    self.visible_ids = []
    visible_devices_env_var = (
        "CUDA_VISIBLE_DEVICES"
        if self.gpu_type == GPUType.nv
        else "AMD_VISIBLE_DEVICES"
    )
    visible_devices_str = os.environ.get(visible_devices_env_var, "")
    try:
        self.visible_ids = list(
            map(
                int,
                visible_devices_str.lstrip("[").rstrip("]").split(","),
            )
        )
    except Exception:
        logger.bind(log_once=True).warning(
            f"Could not parse {visible_devices_env_var}. "
            "All devices will be monitored"
        )