🎓 Fine-Tuning
Train your model on custom data using LoRA (Low-Rank Adaptation).
Quick Start
from quantllm import turbo
model = turbo("meta-llama/Llama-3.2-3B")
# Fine-tune with your data
model.finetune("training_data.json", epochs=3)
# Test the result
response = model.generate("Your custom prompt")
print(response)
Data Formats
Instruction Format (Recommended)
Best for Q&A and task-oriented training:
[
{
"instruction": "What is Python?",
"output": "Python is a high-level programming language known for its simplicity and readability."
},
{
"instruction": "Explain machine learning",
"output": "Machine learning is a subset of AI that enables systems to learn from data."
}
]
Simple Text Format
For language modeling and general text:
[
{"text": "This is the first training example. It can be any text."},
{"text": "This is another example for training the model."}
]
Prompt-Completion Format
Alternative to instruction format:
[
{
"prompt": "Question: What is AI?\nAnswer:",
"completion": "AI stands for Artificial Intelligence."
}
]
HuggingFace Datasets
Load directly from HuggingFace:
# From Hub
model.finetune("tatsu-lab/alpaca", epochs=1)
# Or use datasets library
from datasets import load_dataset
dataset = load_dataset("your-dataset")
model.finetune(dataset, epochs=3)
Training Parameters
Basic Training
model.finetune(
"training_data.json",
epochs=3, # Number of training epochs
batch_size=4, # Batch size (reduce if OOM)
learning_rate=2e-4, # Learning rate
output_dir="./output", # Save directory
)
Advanced Training
model.finetune(
"training_data.json",
epochs=5,
batch_size=4,
learning_rate=2e-4,
# LoRA parameters
lora_r=16, # LoRA rank (higher = more capacity)
lora_alpha=32, # LoRA scaling (typically 2x lora_r)
lora_dropout=0.1, # Dropout for regularization
# Training options
warmup_steps=100, # Learning rate warmup
max_steps=-1, # Max steps (-1 for full epochs)
gradient_accumulation=4, # Accumulate gradients
# Output
output_dir="./finetuned",
save_steps=500, # Save checkpoint every N steps
logging_steps=10, # Log every N steps
)
LoRA Configuration
LoRA (Low-Rank Adaptation) enables efficient fine-tuning:
Parameter |
Default |
Description |
|---|---|---|
|
8 |
Rank of LoRA matrices (4, 8, 16, 32) |
|
16 |
LoRA scaling factor (typically 2×r) |
|
0.1 |
Dropout for regularization |
Choosing LoRA Rank
Rank |
Parameters |
Use Case |
|---|---|---|
4 |
Minimal |
Simple adaptations |
8 |
Low |
Default, good balance |
16 |
Medium |
More complex tasks |
32 |
High |
Maximum quality |
Rule of thumb: Higher rank = more parameters = better quality but slower training.
Training with Hub Integration
Track your training and push to HuggingFace:
from quantllm import turbo, QuantLLMHubManager
model = turbo("meta-llama/Llama-3.2-3B")
# Create hub manager
manager = QuantLLMHubManager(
repo_id="your-username/finetuned-model",
hf_token="hf_..."
)
# Train with automatic tracking
model.finetune(
"training_data.json",
epochs=3,
hub_manager=manager # Automatically tracks hyperparameters
)
# Push the result
manager.save_final_model(model)
manager.push()
After Training
Test Your Model
# Generate with fine-tuned model
response = model.generate("Your custom prompt")
print(response)
# Compare responses
original = turbo("meta-llama/Llama-3.2-3B")
print("Original:", original.generate("prompt"))
print("Fine-tuned:", model.generate("prompt"))
Export the Model
# Export to GGUF
model.export("gguf", "finetuned.Q4_K_M.gguf")
# Export to SafeTensors
model.export("safetensors", "./finetuned-model/")
# Push to HuggingFace
model.push("your-username/finetuned-model")
Save and Load
# Save locally
model.save("./my-finetuned-model/")
# Load later
from quantllm import TurboModel
model = TurboModel.from_pretrained("./my-finetuned-model/")
Tips & Best Practices
Data Quality
Clean your data — Remove duplicates, errors, and noise
Consistent format — Use the same format throughout
Balanced dataset — Mix different types of examples
Minimum 100 examples — More is generally better
Training Settings
Start small — Use few epochs and small data first
Monitor loss — Training loss should decrease steadily
Learning rate — 1e-4 to 3e-4 works for most cases
Batch size — Reduce if you run out of memory
Memory Management
# If you run out of memory:
model.finetune(
data,
batch_size=1, # Smaller batch
gradient_accumulation=8, # Accumulate gradients
)
Avoiding Overfitting
Limit epochs — 1-5 epochs is usually enough
Use dropout —
lora_dropout=0.1Validate — Test on held-out data
Early stopping — Stop when validation loss increases
Common Issues
Out of Memory
# Solution 1: Reduce batch size
model.finetune(data, batch_size=1)
# Solution 2: Use gradient accumulation
model.finetune(data, batch_size=1, gradient_accumulation=8)
# Solution 3: Use smaller LoRA rank
model.finetune(data, lora_r=4)
Training Loss Not Decreasing
# Try higher learning rate
model.finetune(data, learning_rate=3e-4)
# Or more epochs
model.finetune(data, epochs=10)
Model Outputs Garbage
Check your data format
Reduce epochs (overfitting)
Use lower learning rate
Next Steps
GGUF Export → — Export your fine-tuned model
Hub Integration → — Push to HuggingFace
API Reference → — Full API documentation