# 🎓 Fine-Tuning

Train your model on custom data using LoRA (Low-Rank Adaptation).

---

## Quick Start

```python
from quantllm import turbo

model = turbo("meta-llama/Llama-3.2-3B")

# Fine-tune with your data
model.finetune("training_data.json", epochs=3)

# Test the result
response = model.generate("Your custom prompt")
print(response)
```

---

## Data Formats

### Instruction Format (Recommended)

Best for Q&A and task-oriented training:

```json
[
  {
    "instruction": "What is Python?",
    "output": "Python is a high-level programming language known for its simplicity and readability."
  },
  {
    "instruction": "Explain machine learning",
    "output": "Machine learning is a subset of AI that enables systems to learn from data."
  }
]
```

### Simple Text Format

For language modeling and general text:

```json
[
  {"text": "This is the first training example. It can be any text."},
  {"text": "This is another example for training the model."}
]
```

### Prompt-Completion Format

Alternative to instruction format:

```json
[
  {
    "prompt": "Question: What is AI?\nAnswer:",
    "completion": "AI stands for Artificial Intelligence."
  }
]
```

### HuggingFace Datasets

Load directly from HuggingFace:

```python
# From Hub
model.finetune("tatsu-lab/alpaca", epochs=1)

# Or use datasets library
from datasets import load_dataset
dataset = load_dataset("your-dataset")
model.finetune(dataset, epochs=3)
```

---

## Training Parameters

### Basic Training

```python
model.finetune(
    "training_data.json",
    epochs=3,                    # Number of training epochs
    batch_size=4,                # Batch size (reduce if OOM)
    learning_rate=2e-4,          # Learning rate
    output_dir="./output",       # Save directory
)
```

### Advanced Training

```python
model.finetune(
    "training_data.json",
    epochs=5,
    batch_size=4,
    learning_rate=2e-4,
    
    # LoRA parameters
    lora_r=16,                   # LoRA rank (higher = more capacity)
    lora_alpha=32,               # LoRA scaling (typically 2x lora_r)
    lora_dropout=0.1,            # Dropout for regularization
    
    # Training options
    warmup_steps=100,            # Learning rate warmup
    max_steps=-1,                # Max steps (-1 for full epochs)
    gradient_accumulation=4,     # Accumulate gradients
    
    # Output
    output_dir="./finetuned",
    save_steps=500,              # Save checkpoint every N steps
    logging_steps=10,            # Log every N steps
)
```

---

## LoRA Configuration

LoRA (Low-Rank Adaptation) enables efficient fine-tuning:

| Parameter | Default | Description |
|-----------|---------|-------------|
| `lora_r` | 8 | Rank of LoRA matrices (4, 8, 16, 32) |
| `lora_alpha` | 16 | LoRA scaling factor (typically 2×r) |
| `lora_dropout` | 0.1 | Dropout for regularization |

### Choosing LoRA Rank

| Rank | Parameters | Use Case |
|------|------------|----------|
| 4 | Minimal | Simple adaptations |
| 8 | Low | **Default, good balance** |
| 16 | Medium | More complex tasks |
| 32 | High | Maximum quality |

**Rule of thumb**: Higher rank = more parameters = better quality but slower training.

---

## Training with Hub Integration

Track your training and push to HuggingFace:

```python
from quantllm import turbo, QuantLLMHubManager

model = turbo("meta-llama/Llama-3.2-3B")

# Create hub manager
manager = QuantLLMHubManager(
    repo_id="your-username/finetuned-model",
    hf_token="hf_..."
)

# Train with automatic tracking
model.finetune(
    "training_data.json",
    epochs=3,
    hub_manager=manager  # Automatically tracks hyperparameters
)

# Push the result
manager.save_final_model(model)
manager.push()
```

---

## After Training

### Test Your Model

```python
# Generate with fine-tuned model
response = model.generate("Your custom prompt")
print(response)

# Compare responses
original = turbo("meta-llama/Llama-3.2-3B")
print("Original:", original.generate("prompt"))
print("Fine-tuned:", model.generate("prompt"))
```

### Export the Model

```python
# Export to GGUF
model.export("gguf", "finetuned.Q4_K_M.gguf")

# Export to SafeTensors
model.export("safetensors", "./finetuned-model/")

# Push to HuggingFace
model.push("your-username/finetuned-model")
```

### Save and Load

```python
# Save locally
model.save("./my-finetuned-model/")

# Load later
from quantllm import TurboModel
model = TurboModel.from_pretrained("./my-finetuned-model/")
```

---

## Tips & Best Practices

### Data Quality

1. **Clean your data** — Remove duplicates, errors, and noise
2. **Consistent format** — Use the same format throughout
3. **Balanced dataset** — Mix different types of examples
4. **Minimum 100 examples** — More is generally better

### Training Settings

1. **Start small** — Use few epochs and small data first
2. **Monitor loss** — Training loss should decrease steadily
3. **Learning rate** — 1e-4 to 3e-4 works for most cases
4. **Batch size** — Reduce if you run out of memory

### Memory Management

```python
# If you run out of memory:
model.finetune(
    data,
    batch_size=1,                  # Smaller batch
    gradient_accumulation=8,       # Accumulate gradients
)
```

### Avoiding Overfitting

1. **Limit epochs** — 1-5 epochs is usually enough
2. **Use dropout** — `lora_dropout=0.1`
3. **Validate** — Test on held-out data
4. **Early stopping** — Stop when validation loss increases

---

## Common Issues

### Out of Memory

```python
# Solution 1: Reduce batch size
model.finetune(data, batch_size=1)

# Solution 2: Use gradient accumulation
model.finetune(data, batch_size=1, gradient_accumulation=8)

# Solution 3: Use smaller LoRA rank
model.finetune(data, lora_r=4)
```

### Training Loss Not Decreasing

```python
# Try higher learning rate
model.finetune(data, learning_rate=3e-4)

# Or more epochs
model.finetune(data, epochs=10)
```

### Model Outputs Garbage

- Check your data format
- Reduce epochs (overfitting)
- Use lower learning rate

---

## Next Steps

- [GGUF Export →](gguf-export.md) — Export your fine-tuned model
- [Hub Integration →](hub-integration.md) — Push to HuggingFace
- [API Reference →](../api/turbomodel.md) — Full API documentation