On-Device MLNOV 24
Model Quantization — What Actually Happens When You Shrink a Transformer
A detailed walkthrough of quantization data types, symmetric vs affine methods, scaling algorithms, and the trade-offs nobody tells you about until your model accuracy falls off a cliff.