🎓 Fine-Tuning

Train your model on custom data using LoRA (Low-Rank Adaptation).

Quick Start

from quantllm import turbo

model = turbo("meta-llama/Llama-3.2-3B")

# Fine-tune with your data
model.finetune("training_data.json", epochs=3)

# Test the result
response = model.generate("Your custom prompt")
print(response)

Data Formats

Instruction Format (Recommended)

Best for Q&A and task-oriented training:

[
  {
    "instruction": "What is Python?",
    "output": "Python is a high-level programming language known for its simplicity and readability."
  },
  {
    "instruction": "Explain machine learning",
    "output": "Machine learning is a subset of AI that enables systems to learn from data."
  }
]

Simple Text Format

For language modeling and general text:

[
  {"text": "This is the first training example. It can be any text."},
  {"text": "This is another example for training the model."}
]

Prompt-Completion Format

Alternative to instruction format:

[
  {
    "prompt": "Question: What is AI?\nAnswer:",
    "completion": "AI stands for Artificial Intelligence."
  }
]

HuggingFace Datasets

Load directly from HuggingFace:

# From Hub
model.finetune("tatsu-lab/alpaca", epochs=1)

# Or use datasets library
from datasets import load_dataset
dataset = load_dataset("your-dataset")
model.finetune(dataset, epochs=3)

Training Parameters

Basic Training

model.finetune(
    "training_data.json",
    epochs=3,                    # Number of training epochs
    batch_size=4,                # Batch size (reduce if OOM)
    learning_rate=2e-4,          # Learning rate
    output_dir="./output",       # Save directory
)

Advanced Training

model.finetune(
    "training_data.json",
    epochs=5,
    batch_size=4,
    learning_rate=2e-4,
    
    # LoRA parameters
    lora_r=16,                   # LoRA rank (higher = more capacity)
    lora_alpha=32,               # LoRA scaling (typically 2x lora_r)
    lora_dropout=0.1,            # Dropout for regularization
    
    # Training options
    warmup_steps=100,            # Learning rate warmup
    max_steps=-1,                # Max steps (-1 for full epochs)
    gradient_accumulation=4,     # Accumulate gradients
    
    # Output
    output_dir="./finetuned",
    save_steps=500,              # Save checkpoint every N steps
    logging_steps=10,            # Log every N steps
)

LoRA Configuration

LoRA (Low-Rank Adaptation) enables efficient fine-tuning:

Parameter	Default	Description
`lora_r`	8	Rank of LoRA matrices (4, 8, 16, 32)
`lora_alpha`	16	LoRA scaling factor (typically 2×r)
`lora_dropout`	0.1	Dropout for regularization

Choosing LoRA Rank

Rank	Parameters	Use Case
4	Minimal	Simple adaptations
8	Low	Default, good balance
16	Medium	More complex tasks
32	High	Maximum quality

Rule of thumb: Higher rank = more parameters = better quality but slower training.

Training with Hub Integration

Track your training and push to HuggingFace:

from quantllm import turbo, QuantLLMHubManager

model = turbo("meta-llama/Llama-3.2-3B")

# Create hub manager
manager = QuantLLMHubManager(
    repo_id="your-username/finetuned-model",
    hf_token="hf_..."
)

# Train with automatic tracking
model.finetune(
    "training_data.json",
    epochs=3,
    hub_manager=manager  # Automatically tracks hyperparameters
)

# Push the result
manager.save_final_model(model)
manager.push()

After Training

Test Your Model

# Generate with fine-tuned model
response = model.generate("Your custom prompt")
print(response)

# Compare responses
original = turbo("meta-llama/Llama-3.2-3B")
print("Original:", original.generate("prompt"))
print("Fine-tuned:", model.generate("prompt"))

Export the Model

# Export to GGUF
model.export("gguf", "finetuned.Q4_K_M.gguf")

# Export to SafeTensors
model.export("safetensors", "./finetuned-model/")

# Push to HuggingFace
model.push("your-username/finetuned-model")

Save and Load

# Save locally
model.save("./my-finetuned-model/")

# Load later
from quantllm import TurboModel
model = TurboModel.from_pretrained("./my-finetuned-model/")

Tips & Best Practices

Data Quality

Clean your data — Remove duplicates, errors, and noise
Consistent format — Use the same format throughout
Balanced dataset — Mix different types of examples
Minimum 100 examples — More is generally better

Training Settings

Start small — Use few epochs and small data first
Monitor loss — Training loss should decrease steadily
Learning rate — 1e-4 to 3e-4 works for most cases
Batch size — Reduce if you run out of memory

Memory Management

# If you run out of memory:
model.finetune(
    data,
    batch_size=1,                  # Smaller batch
    gradient_accumulation=8,       # Accumulate gradients
)

Avoiding Overfitting

Limit epochs — 1-5 epochs is usually enough
Use dropout — lora_dropout=0.1
Validate — Test on held-out data
Early stopping — Stop when validation loss increases

Common Issues

Out of Memory

# Solution 1: Reduce batch size
model.finetune(data, batch_size=1)

# Solution 2: Use gradient accumulation
model.finetune(data, batch_size=1, gradient_accumulation=8)

# Solution 3: Use smaller LoRA rank
model.finetune(data, lora_r=4)

Training Loss Not Decreasing

# Try higher learning rate
model.finetune(data, learning_rate=3e-4)

# Or more epochs
model.finetune(data, epochs=10)

Model Outputs Garbage

Check your data format
Reduce epochs (overfitting)
Use lower learning rate

Next Steps

GGUF Export → — Export your fine-tuned model
Hub Integration → — Push to HuggingFace
API Reference → — Full API documentation