Efficient Heterogeneous Large Language Model Decoding with Model-...
[11]Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng L... [31]Yaniv Leviathan, Matan Kalman, and Yossi Matias.Fast inference from transformers via specula...