llmcompressor.pipelines.cache
Classes:
-
IntermediateValue–Dataclass which recursively defines offloaded values and which device to onload to
-
IntermediatesCache–Cache which stores intermediate values (activations) produced by batched, sequential
-
OverrideEqMode–When using a torch.Tensor as a key in a dictionary, the equality
IntermediateValue dataclass
Dataclass which recursively defines offloaded values and which device to onload to
Parameters:
-
(valueTensor | 'IntermediateValue' | Any) –either an offloaded Tensor, an primative value, or a recursable value
-
(devicedevice | None) –if the value is a Tensor, then the device to onload the tensor to, otherwise None
IntermediatesCache
IntermediatesCache(
batch_intermediates: list[IntermediateValues]
| None = None,
offload_device: device | None = "cpu",
)
Cache which stores intermediate values (activations) produced by batched, sequential execution of models. Values are offloaded to the offload_device when stored in the cache and onloaded to their original device when fetched from the cache. If offload_device is None, values will not be offloaded at all.
Currently supports nested offloading of dataclass instances and tuples
Construct using empty and from_dataloader class methods
Methods:
-
append–Append new values to the cache. The new values will be assigned the next
-
delete–Delete values from the cache
-
empty–Construct an empty cache
-
fetch–Fetch values belonging to a batch
-
from_dataloader–Initialize a cache with data from the provided dataloader
-
size–Returns the memory used by cached values, keyed by device, in bytes
-
update–Update/put values belonging to a batch
Source code in llmcompressor/pipelines/cache.py
append
Append new values to the cache. The new values will be assigned the next available batch index
Parameters:
-
(valuesdict[str, Any]) –dictionary mapping keys to values used for update
Source code in llmcompressor/pipelines/cache.py
delete
Delete values from the cache
Parameters:
-
(batch_indexint) –index of batch whose values will be deleted
-
(consumed_nameslist[str] | None, default:None) –list of keys whose values will be deleted, defaults to removing all keys
Source code in llmcompressor/pipelines/cache.py
empty classmethod
Construct an empty cache
Parameters:
-
(num_batchesint) –the expected number of batches to be stored
-
(offload_devicedevice) –device to offload values to
Source code in llmcompressor/pipelines/cache.py
fetch
Fetch values belonging to a batch
Parameters:
-
(batch_indexint) –index of batch whose values are being fetched
-
(input_nameslist[str] | None, default:None) –list of keys whose values are being fetched
Returns:
-
dict[str, Any]–dictionary mapping keys to onloaded values
Source code in llmcompressor/pipelines/cache.py
from_dataloader classmethod
from_dataloader(
dataloader: DataLoader,
model_device: device = torch.device("cpu"),
offload_device: device | None = torch.device("cpu"),
)
Initialize a cache with data from the provided dataloader
This method iterates through all batches in the dataloader and offloads them to the specified device. For faster cache preparation, consider: - Increasing batch_size to reduce the number of iterations - Using num_workers > 0 in the DataLoader for parallel loading (e.g. the calibration DataLoader from format_calibration_data uses dataloader_num_workers; when > 0, pin_memory and prefetch_factor are also set where applicable, which speeds both cache build and calibration) - Ensuring data preprocessing is done before creating the dataloader
Parameters:
-
(dataloaderDataLoader) –dataloader which generates values to be cached
-
(model_devicedevice, default:device('cpu')) –device which values will be onloaded to when fetched
-
(offload_devicedevice | None, default:device('cpu')) –device to offload values to
Source code in llmcompressor/pipelines/cache.py
size
Returns the memory used by cached values, keyed by device, in bytes
Returns:
-
dict[device, int]–dictionary mapping torch device to number of bytes in cache
Source code in llmcompressor/pipelines/cache.py
update
Update/put values belonging to a batch
Parameters:
-
(batch_indexint) –index of batch whose values will be updated
-
(valuesdict[str, Any]) –dictionary mapping keys to values used for update
Source code in llmcompressor/pipelines/cache.py
OverrideEqMode
Bases: TorchDispatchMode
When using a torch.Tensor as a key in a dictionary, the equality check must return a single value instead of a torch.Tensor of bool values. Use this override context for such cases, to swap out the torch.eq equality check for a check on id
a = torch.tensor([1,2,3]) b = torch.tensor([1,2,3]) a == b tensor([True, True, True]) with OverrideEqMode(): ... a == b tensor(True)