Efficient Heterogeneous Large Language Model Decoding with Model-...
[31]Yaniv Leviathan, Matan Kalman, and Yossi Matias.Fast inference from transformers via speculative decoding, 2023. [32]Wantong Li, Madison Manley, James Read, Ankit Kaul, Muhannad S. Baki...