Model Compression

Written — Updated
  • Quantization

    • Just compressing the weights down to smaller sizes. FP16 -> fp8/int8, or even 4 bits.
  • Pruning


Thanks for reading! If you have any questions or comments, please send me a note on Twitter.