Mitigating the Capacity Gap in Knowledge Distillation via Iterative Tutoring Approach

Project Details

Student(s): Sara Karam
Advisor(s): Dr. Joe Tekli | Co-Supervisor: Mr. Joseph Attiyeh, Ph.D. Student
Department: Electrical & Computer
Academic Year(s): 2023-2024

Abstract

Large-scale language models (LLMs) have resulted in significant improvements in artificial intelligence in terms of accuracy in understanding and generating natural language. However, their deployment in resource-constrained environments is limited by their high computational demands. Hence, Knowledge Distillation (KD) has emerged as a prominent method to address such challenges by enabling the transfer of knowledge from a large pre-trained model (so-called teacher), to a smaller and more efficient model (so-called student). However, some bottlenecks exist in the effectiveness of this technique, such as the “capacity gap” between the teacher and student learning abilities, which may negatively impact the overall performance of the distilled model. This research addresses this limitation by introducing a Tutor-Enhanced Iterative Distillation (TEID) to close the capacity gap, by using a ternary system consisting of the teacher, tutor, and student models. This architecture innovates over classical KD methods by adding an intermediate sized tutor model that assists in improved knowledge transfer. TEID uses a selective learning strategy to enable the student model to learn from either the teacher or the tutor models, alongside a continuous updating and re-distillation of the tutor. To achieve even further compression, we propose an iterative procedure whereby the TEID is repeated on the tutor and on the previously resultant student, producing a new smaller student model. We employ the GLUE benchmark to evaluate our solution’s performance, demonstrating that our approach mitigates the teacher-student capacity gap, and improves the efficiency and performance for distilled models. This work presents a major step forward in the domain of knowledge distillation applied to LLMs, by providing an easily scalable solution for deploying state-of-the-art language models even in resource-constrained environments.