Skip to content

Commit 245ab4e

Browse files
committed
Fixing linear_activation_tensor dynamic quant (#622)
Summary: dynamic quant was broken for generate due to no repr function Test Plan: sh benchmarks.sh 20240806170037, tok/s= 9.54, mem/s= 63.14 GB/s, peak_mem= 8.61 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, kv_quant: False, compile: True, compile_prefill: False, dtype: torch.bfloat16, device: cuda repro: python generate.py --quantization int8dq --checkpoint_path ../../../checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth --device cuda --precision torch.bfloat16 --compile --num_samples 5 --max_new_tokens 200 --top_k 200 --temperature 0.8 Reviewers: Subscribers: Tasks: Tags:
1 parent febeaac commit 245ab4e

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

torchao/quantization/linear_activation_quantized_tensor.py

+3
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,9 @@ def __init__(
3939
self.original_weight_tensor = original_weight_tensor
4040
self.input_quant_func = input_quant_func
4141

42+
def __repr__(self):
43+
return f"LinearActivationQuantizedTensor({self.original_weight_tensor}, {self.input_quant_func})"
44+
4245
def __tensor_flatten__(self):
4346
return ["original_weight_tensor"], [self.input_quant_func]
4447

0 commit comments

Comments
 (0)