Tech
Yandex Research Team’s New Compression Methods: AQLM and PV-Tuning
The Yandex Research Team is enhancing efficiency with innovative compression methods such as AQLM and PV-Tuning. In this article, explore the advantages and application areas of these techniques.
New Compression Methods by the Yandex Research Team
The Yandex Research team introduced two new compression methods at the International Conference on Machine Learning (ICML) ongoing in Vienna, Austria. These methods are called Additive Quantization for Language Models (AQLM) and PV-Tuning. Researchers state that when these methods are combined, they can achieve up to 8 times reduction in model size while maintaining 95% of response quality.
Key Features of AQLM and PV-Tuning
AQLM leverages additive quantization methods traditionally used in information retrieval for compressing large language models (LLMs). This developed method preserves and improves the accuracy of the model under extreme compression, making it possible for LLMs to be widely used on everyday devices like home computers. This results in a significant reduction in memory consumption.
PV-Tuning is designed to correct potential errors that may arise during the model compression process. When AQLM and PV-Tuning are combined, they yield optimal results by providing compact models that can produce high-quality responses even with limited computational resources.
Evaluation of the Methods
The effectiveness of these presented methods has been meticulously tested using popular open-source models such as LLama 2, Mistral, and Mixtral. Researchers evaluated the response quality of these compressed large language models using English comparison metrics like WikiText2 and C4. Despite being compressed by 8 times, the models managed to maintain a response quality of 95%.
Who Can Benefit from AQLM and PV-Tuning?
The Yandex research team emphasizes that the AQLM and PV-Tuning methods will provide significant resource savings for companies developing and distributing proprietary language models and open-source LLMs. For example, the Llama 2 model with 13 billion parameters post-compression can operate using only 1 GPU, leading to up to 8 times reduction in hardware costs. This means that enterprises, individual researchers, and LLM enthusiasts can run advanced LLMs like Llama on their everyday computers.
Exploring New LLM Applications
The researchers state that AQLM and PV-Tuning enable offline distribution on devices with limited computing resources, offering new use cases for devices like smartphones and smart speakers. They express that thanks to advanced LLMs integrated into these devices, users can utilize services like text and image generation, voice assistance, personalized recommendations, and real-time language translation without needing a continuous internet connection.
Application and Access
Today, developers and researchers worldwide can access AQLM and PV-Tuning methods via GitHub. The demo materials provided by developers offer guiding information for effectively training compressed LLMs for various applications. Additionally, it is also possible to download popular open-source models compressed using these methods.
Highlights from ICML
The scientific paper detailing Yandex Research’s AQLM compression method has been published at ICML, one of the most prestigious machine learning conferences worldwide. This study, prepared in collaboration with researchers from IST Austria and experts from the AI venture Neural Magic, demonstrates significant progress in LLM compression technology.