llmcompressor.utils.helpers
General utility helper functions. Common functions for interfacing with python primitives and directories/files.
Functions:
-
DisableQuantization–Disable quantization during forward passes after applying a quantization config
-
calibration_forward_context–Context in which all calibration forward passes should occur.
-
disable_cache–Temporarily disable the key-value cache for transformer models. Used to prevent
-
disable_hf_kernels–In transformers>=4.50.0, some module forward methods may be
-
disable_lm_head–Disable the lm_head of a model by moving it to the meta device. This function
-
eval_context–Disable pytorch training mode for the given module
-
import_from_path–Import the module and the name of the function/class separated by :
-
is_package_available–A helper function to check if a package is available
DisableQuantization
Disable quantization during forward passes after applying a quantization config
Source code in llmcompressor/utils/helpers.py
calibration_forward_context
Context in which all calibration forward passes should occur.
- Remove gradient calculations
- Disable the KV cache
- Disable train mode and enable eval mode
- Disable hf kernels which could bypass hooks
- Disable lm head (input and weights can still be calibrated, output will be meta)
Source code in llmcompressor/utils/helpers.py
disable_cache
Temporarily disable the key-value cache for transformer models. Used to prevent excess memory use in one-shot cases where the model only performs the prefill phase and not the generation phase.
Example:
model = AutoModel.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0") input = torch.randint(0, 32, size=(1, 32)) with disable_cache(model): ... output = model(input)
Source code in llmcompressor/utils/helpers.py
disable_hf_kernels
In transformers>=4.50.0, some module forward methods may be replaced by calls to hf hub kernels. This has the potential to bypass hooks added by LLM Compressor
Source code in llmcompressor/utils/helpers.py
disable_lm_head
Disable the lm_head of a model by moving it to the meta device. This function does not untie parameters and restores the model proper loading upon exit
Source code in llmcompressor/utils/helpers.py
eval_context
Disable pytorch training mode for the given module
Source code in llmcompressor/utils/helpers.py
import_from_path
Import the module and the name of the function/class separated by : Examples: path = "/path/to/file.py:func_or_class_name" path = "/path/to/file:focn" path = "path.to.file:focn"
Parameters:
-
(pathstr) –path including the file path and object name
Source code in llmcompressor/utils/helpers.py
is_package_available
is_package_available(
package_name: str, return_version: bool = False
) -> Union[Tuple[bool, str], bool]
A helper function to check if a package is available and optionally return its version. This function enforces a check that the package is available and is not just a directory/file with the same name as the package.
inspired from: https://github.com/huggingface/transformers/blob/965cf677695dd363285831afca8cf479cf0c600c/src/transformers/utils/import_utils.py#L41
Parameters:
-
(package_namestr) –The package name to check for
-
(return_versionbool, default:False) –True to return the version of the package if available
Returns:
-
Union[Tuple[bool, str], bool]–True if the package is available, False otherwise or a tuple of (bool, version) if return_version is True