# 🚀 QuantLLM Documentation
The Ultra-Fast LLM Quantization & Export Library
Load → Quantize → Fine-tune → Export — All in One Line
---
## Welcome to QuantLLM v2.1 (pre-release)
QuantLLM makes working with large language models simple. Load any model, quantize it automatically, fine-tune with your data, and export to any format — all with just a few lines of code.
```python
from quantllm import turbo
# Load with shared export/push defaults
model = turbo(
"meta-llama/Llama-3.2-3B",
config={"format": "gguf", "quantization": "Q4_K_M", "push_format": "gguf"},
)
# Generate text
print(model.generate("Explain quantum computing"))
# Export to GGUF for Ollama/llama.cpp
model.export()
# Push to HuggingFace with auto-generated model card
model.push("username/my-model")
```
---
## 📚 Documentation
```{toctree}
:maxdepth: 2
:caption: Getting Started
installation
quickstart
```
```{toctree}
:maxdepth: 2
:caption: User Guide
guide/loading-models
guide/generation
guide/finetuning
guide/gguf-export
guide/hub-integration
```
```{toctree}
:maxdepth: 2
:caption: API Reference
api/turbo
api/turbomodel
api/gguf
api/hub
```
---
## ✨ Key Features
| Feature | Description |
|---------|-------------|
| 🔥 **TurboModel API** | One unified interface for everything |
| 📦 **Multi-Format Export** | GGUF, ONNX, MLX, SafeTensors |
| âš¡ **Auto-Optimization** | Flash Attention, torch.compile, dynamic padding |
| 🎨 **Beautiful UI** | Orange-themed progress bars and logging |
| 🤗 **Hub Integration** | One-click push with auto model cards |
| 🧠**45+ Architectures** | Llama, Mistral, Qwen, Phi, Gemma, and more |
---
## 🚀 Quick Examples
### Load Any Model
```python
from quantllm import turbo
model = turbo("mistralai/Mistral-7B")
model = turbo("Qwen/Qwen2-7B", bits=4)
model = turbo("microsoft/phi-3-mini")
```
### Export to Any Format
```python
model = turbo(
"meta-llama/Llama-3.2-3B",
config={"format": "gguf", "quantization": "Q4_K_M", "push_format": "gguf"},
)
model.export()
model.export("onnx", "./model-onnx/")
model.export("mlx", "./model-mlx/", quantization="4bit")
```
### Fine-tune in One Line
```python
model.finetune("training_data.json", epochs=3)
```
### Push to HuggingFace
```python
model.push("username/my-model")
```
---
## 💻 System Requirements
- **Python**: 3.10+
- **PyTorch**: 2.0+
- **GPU**: NVIDIA with 6GB+ VRAM (recommended)
- **Platforms**: Windows, Linux, macOS
---
## Indices and Tables
- {ref}`genindex`
- {ref}`modindex`
- {ref}`search`