llmcompressor.utils.metric_logging

Utility functions for metrics logging and GPU memory monitoring.

This module provides functions for tracking GPU memory usage, measuring model layer sizes, and comprehensive logging during compression workflows. Supports both NVIDIA and AMD GPU monitoring with detailed memory statistics and performance metrics.

Classes:

CompressionLogger –

Log metrics related to compression algorithm

CompressionLogger

CompressionLogger(module: Module)

Log metrics related to compression algorithm

Parameters:

start_tick
–

time when algorithm started
losses
–

loss as result of algorithm
gpu_type
–

device manufacturer (e.g. Nvidia, AMD)
visible_ids
–

list of device ids visible to current process

Source code in llmcompressor/utils/metric_logging.py

def __init__(self, module: torch.nn.Module):
    self.module = module
    self.start_tick = None
    self.loss = None
    self.gpu_type = GPUType.amd if torch.version.hip else GPUType.nv

    # Parse appropriate env var for visible devices to monitor
    # If env var is unset, default to all devices
    self.visible_ids = []
    visible_devices_env_var = (
        "CUDA_VISIBLE_DEVICES"
        if self.gpu_type == GPUType.nv
        else "AMD_VISIBLE_DEVICES"
    )
    visible_devices_str = os.environ.get(visible_devices_env_var, "")
    try:
        self.visible_ids = list(
            map(
                int,
                visible_devices_str.lstrip("[").rstrip("]").split(","),
            )
        )
    except Exception:
        logger.bind(log_once=True).warning(
            f"Could not parse {visible_devices_env_var}. "
            "All devices will be monitored"
        )

llmcompressor.utils.metric_logging

CompressionLogger

`start_tick`

`losses`

`gpu_type`

`visible_ids`