🎓 Fine-Tuning

Train your model on custom data using LoRA (Low-Rank Adaptation).


Quick Start

from quantllm import turbo

model = turbo("meta-llama/Llama-3.2-3B")

# Fine-tune with your data
model.finetune("training_data.json", epochs=3)

# Test the result
response = model.generate("Your custom prompt")
print(response)

Data Formats

Simple Text Format

For language modeling and general text:

[
  {"text": "This is the first training example. It can be any text."},
  {"text": "This is another example for training the model."}
]

Prompt-Completion Format

Alternative to instruction format:

[
  {
    "prompt": "Question: What is AI?\nAnswer:",
    "completion": "AI stands for Artificial Intelligence."
  }
]

HuggingFace Datasets

Load directly from HuggingFace:

# From Hub
model.finetune("tatsu-lab/alpaca", epochs=1)

# Or use datasets library
from datasets import load_dataset
dataset = load_dataset("your-dataset")
model.finetune(dataset, epochs=3)

Training Parameters

Basic Training

model.finetune(
    "training_data.json",
    epochs=3,                    # Number of training epochs
    batch_size=4,                # Batch size (reduce if OOM)
    learning_rate=2e-4,          # Learning rate
    output_dir="./output",       # Save directory
)

Advanced Training

model.finetune(
    "training_data.json",
    epochs=5,
    batch_size=4,
    learning_rate=2e-4,
    
    # LoRA parameters
    lora_r=16,                   # LoRA rank (higher = more capacity)
    lora_alpha=32,               # LoRA scaling (typically 2x lora_r)
    lora_dropout=0.1,            # Dropout for regularization
    
    # Training options
    warmup_steps=100,            # Learning rate warmup
    max_steps=-1,                # Max steps (-1 for full epochs)
    gradient_accumulation=4,     # Accumulate gradients
    
    # Output
    output_dir="./finetuned",
    save_steps=500,              # Save checkpoint every N steps
    logging_steps=10,            # Log every N steps
)

LoRA Configuration

LoRA (Low-Rank Adaptation) enables efficient fine-tuning:

Parameter

Default

Description

lora_r

8

Rank of LoRA matrices (4, 8, 16, 32)

lora_alpha

16

LoRA scaling factor (typically 2×r)

lora_dropout

0.1

Dropout for regularization

Choosing LoRA Rank

Rank

Parameters

Use Case

4

Minimal

Simple adaptations

8

Low

Default, good balance

16

Medium

More complex tasks

32

High

Maximum quality

Rule of thumb: Higher rank = more parameters = better quality but slower training.


Training with Hub Integration

Track your training and push to HuggingFace:

from quantllm import turbo, QuantLLMHubManager

model = turbo("meta-llama/Llama-3.2-3B")

# Create hub manager
manager = QuantLLMHubManager(
    repo_id="your-username/finetuned-model",
    hf_token="hf_..."
)

# Train with automatic tracking
model.finetune(
    "training_data.json",
    epochs=3,
    hub_manager=manager  # Automatically tracks hyperparameters
)

# Push the result
manager.save_final_model(model)
manager.push()

After Training

Test Your Model

# Generate with fine-tuned model
response = model.generate("Your custom prompt")
print(response)

# Compare responses
original = turbo("meta-llama/Llama-3.2-3B")
print("Original:", original.generate("prompt"))
print("Fine-tuned:", model.generate("prompt"))

Export the Model

# Export to GGUF
model.export("gguf", "finetuned.Q4_K_M.gguf")

# Export to SafeTensors
model.export("safetensors", "./finetuned-model/")

# Push to HuggingFace
model.push("your-username/finetuned-model")

Save and Load

# Save locally
model.save("./my-finetuned-model/")

# Load later
from quantllm import TurboModel
model = TurboModel.from_pretrained("./my-finetuned-model/")

Tips & Best Practices

Data Quality

  1. Clean your data — Remove duplicates, errors, and noise

  2. Consistent format — Use the same format throughout

  3. Balanced dataset — Mix different types of examples

  4. Minimum 100 examples — More is generally better

Training Settings

  1. Start small — Use few epochs and small data first

  2. Monitor loss — Training loss should decrease steadily

  3. Learning rate — 1e-4 to 3e-4 works for most cases

  4. Batch size — Reduce if you run out of memory

Memory Management

# If you run out of memory:
model.finetune(
    data,
    batch_size=1,                  # Smaller batch
    gradient_accumulation=8,       # Accumulate gradients
)

Avoiding Overfitting

  1. Limit epochs — 1-5 epochs is usually enough

  2. Use dropoutlora_dropout=0.1

  3. Validate — Test on held-out data

  4. Early stopping — Stop when validation loss increases


Common Issues

Out of Memory

# Solution 1: Reduce batch size
model.finetune(data, batch_size=1)

# Solution 2: Use gradient accumulation
model.finetune(data, batch_size=1, gradient_accumulation=8)

# Solution 3: Use smaller LoRA rank
model.finetune(data, lora_r=4)

Training Loss Not Decreasing

# Try higher learning rate
model.finetune(data, learning_rate=3e-4)

# Or more epochs
model.finetune(data, epochs=10)

Model Outputs Garbage

  • Check your data format

  • Reduce epochs (overfitting)

  • Use lower learning rate


Next Steps