llmcompressor.pipelines
Compression pipelines for orchestrating different compression strategies.
Provides various compression pipelines including basic, sequential, independent, layer-sequential, and data-free approaches. Each pipeline coordinates different compression techniques and workflows for optimal model optimization based on specific requirements and constraints.
Modules:
-
basic– -
cache– -
data_free– -
independent– -
registry– -
sequential–
Classes:
-
BasicPipeline– -
CalibrationPipeline– -
DataFreePipeline– -
IndependentPipeline– -
SequentialPipeline– -
Subgraph–Dataclass specifying an executable subgraph of a model graph
Functions:
-
dispatch_for_sequential–Dispatch a model for sequential calibration using a sequential pipeline.
-
get_sequential_targets–Infer sequential targets from modifiers list and dataset args
-
handle_sequential_oom–Catch ooms and suggest changing sequential targets
-
trace_subgraphs–Trace a model to produce subgraphs, where each sequential target belongs to exactly
BasicPipeline
Bases: CalibrationPipeline
CalibrationPipeline
Bases: ABC, RegistryMixin
Methods:
-
from_modifiers–Infer which calibration pipeline to use based on the available modifiers and
from_modifiers classmethod
Infer which calibration pipeline to use based on the available modifiers and any user specifications
Parameters:
-
(modifierslist[Modifier]) –modifiers to apply to model
-
(userstr | None, default:None) –pipeline name passed by user
Returns:
-
CalibrationPipeline–CalibrationPipeline instance to be called with data (if not datafree)
Source code in llmcompressor/pipelines/registry.py
DataFreePipeline
Bases: CalibrationPipeline
IndependentPipeline
Bases: CalibrationPipeline
SequentialPipeline
Bases: CalibrationPipeline
Subgraph dataclass
Subgraph(
graph: Graph,
input_names: set[str],
consumed_names: set[str],
_code: PythonCode | None = None,
)
Dataclass specifying an executable subgraph of a model graph
Parameters:
-
(graphGraph) –subgraph of model graph
-
(input_namesset[str]) –argument names of the compiled forward function
-
(consumed_namesset[str]) –argument names which are not used by any subsequent subgraphs and can therefore be deleted from the intermediates cache
Methods:
-
forward–Execute the operations within the subgraph
forward
Execute the operations within the subgraph
Parameters:
-
–\*argsargument inputs to subgraph forward function
-
–\**kwargskeyword inputs to subgraph forward function
Returns:
-
dict[str, Any]–
Source code in llmcompressor/pipelines/sequential/helpers.py
dispatch_for_sequential
dispatch_for_sequential(
model: PreTrainedModel,
onload_device: Optional[device | str] = None,
offload_device: Optional[device | str] = None,
) -> PreTrainedModel
Dispatch a model for sequential calibration using a sequential pipeline. The model will be offloaded to the CPU and dispatched to CUDA/XPU device if available. Removes any existing hooks.
Parameters:
-
(modelPreTrainedModel) –model to dispatch
Returns:
-
PreTrainedModel–dispatched model
Source code in llmcompressor/pipelines/sequential/helpers.py
get_sequential_targets
get_sequential_targets(
modifiers: list[Modifier],
model: PreTrainedModel,
args: DatasetArguments,
) -> list[str]
Infer sequential targets from modifiers list and dataset args
Parameters:
-
(modelPreTrainedModel) –model being calibrated
-
(modifierslist[Modifier]) –list of modifiers being applied during calibration
-
–dataset_argsdataset arguments passed by user
Returns:
-
list[str]–list of sequential targets
Source code in llmcompressor/pipelines/sequential/helpers.py
handle_sequential_oom
Catch ooms and suggest changing sequential targets
Source code in llmcompressor/pipelines/sequential/helpers.py
trace_subgraphs
trace_subgraphs(
model: PreTrainedModel,
sample_input: dict[str, Any],
sequential_targets: list[str],
ignore: list[str],
) -> list[Subgraph]
Trace a model to produce subgraphs, where each sequential target belongs to exactly one subgraph and where executing each subgraph in order is equivalent to executing the original model
Parameters:
-
(modelPreTrainedModel) –model being traced
-
(sample_inputdict[str, Any]) –inputs whose values will change during execution but whose len, bool, and contains values are assumed constant across batches
-
(sequential_targetslist[str]) –list of patterns matching sequential targets
-
(ignorelist[str]) –function and method names to skip during tracing
Returns:
-
list[Subgraph]–a list of Subgraphs in order of execution
Source code in llmcompressor/pipelines/sequential/helpers.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |