AI & Machine Learning ★ 2.7k Python

0xSojalSec/airllm

by @0xSojalSec · March 16, 2026

2.7k Stars

193 Forks

0 Issues

Python Language

AirLLM enables running 70B parameter large language models with inference on a single 4GB GPU. Uses layer-by-layer computation and memory-efficient techniques to run massive models on consumer hardware that would normally require expensive multi-GPU setups. Makes state-of-the-art AI accessible to researchers and developers with limited GPU resources.

@0xSojalSec Project maintainer on GitHub

View Profile

View on GitHub

git clone https://github.com/0xSojalSec/airllm.git

Quick Start Example

python

from airllm import AutoModel

# Load a 70B model on 4GB GPU
model = AutoModel.from_pretrained(
    "meta-llama/Llama-2-70b-hf"
)

# Generate text
output = model.generate(
    "Explain quantum computing",
    max_length=200
)
print(output)

0xSojalSec/airllm

Quick Start Example

Tags

Related Projects