🚀 QuantLLM Documentation
The Ultra-Fast LLM Quantization & Export Library
Load → Quantize → Fine-tune → Export — All in One Line
Load → Quantize → Fine-tune → Export — All in One Line
Welcome to QuantLLM v2.1 (pre-release)
QuantLLM makes working with large language models simple. Load any model, quantize it automatically, fine-tune with your data, and export to any format — all with just a few lines of code.
from quantllm import turbo
# Load with shared export/push defaults
model = turbo(
"meta-llama/Llama-3.2-3B",
config={"format": "gguf", "quantization": "Q4_K_M", "push_format": "gguf"},
)
# Generate text
print(model.generate("Explain quantum computing"))
# Export to GGUF for Ollama/llama.cpp
model.export()
# Push to HuggingFace with auto-generated model card
model.push("username/my-model")
📚 Documentation
Getting Started
API Reference
✨ Key Features
Feature |
Description |
|---|---|
🔥 TurboModel API |
One unified interface for everything |
📦 Multi-Format Export |
GGUF, ONNX, MLX, SafeTensors |
⚡ Auto-Optimization |
Flash Attention, torch.compile, dynamic padding |
🎨 Beautiful UI |
Orange-themed progress bars and logging |
🤗 Hub Integration |
One-click push with auto model cards |
🧠 45+ Architectures |
Llama, Mistral, Qwen, Phi, Gemma, and more |
🚀 Quick Examples
Load Any Model
from quantllm import turbo
model = turbo("mistralai/Mistral-7B")
model = turbo("Qwen/Qwen2-7B", bits=4)
model = turbo("microsoft/phi-3-mini")
Export to Any Format
model = turbo(
"meta-llama/Llama-3.2-3B",
config={"format": "gguf", "quantization": "Q4_K_M", "push_format": "gguf"},
)
model.export()
model.export("onnx", "./model-onnx/")
model.export("mlx", "./model-mlx/", quantization="4bit")
Fine-tune in One Line
model.finetune("training_data.json", epochs=3)
Push to HuggingFace
model.push("username/my-model")
💻 System Requirements
Python: 3.10+
PyTorch: 2.0+
GPU: NVIDIA with 6GB+ VRAM (recommended)
Platforms: Windows, Linux, macOS