lyogavin/airllm — GitHub Repository Preview
AI & Machine Learning ★ 14.0k Python

lyogavin/airllm

by @lyogavin ·

14.0k Stars
1.4k Forks
127 Issues
Python Language

AirLLM is a memory-optimized inference framework that runs 70B+ parameter LLMs on a single 4GB GPU without quantization. Decomposes models into layer-wise shards that load and unload dynamically. Supports Llama, Mistral, QWen, ChatGLM architectures with optional 4-bit/8-bit compression for 3x speed gains. Cross-platform including macOS. Makes massive AI models accessible on consumer hardware.

lyogavin
@lyogavin Project maintainer on GitHub
View Profile
View on GitHub
git clone https://github.com/lyogavin/airllm.git

Quick Start Example

python
from airllm import AutoModel

# Run 70B model on 4GB GPU
model = AutoModel.from_pretrained(
    "meta-llama/Llama-2-70b-hf"
)

input_text = "Explain quantum computing"
output = model.generate(input_text, max_length=200)
print(output)

Tags

#llm#inference#gpu#optimization#python#deep-learning

Related Projects