# 🤗 Hub Integration

Push and pull models from HuggingFace Hub with auto-generated model cards.

---

## Quick Push

The easiest way to share your model:

```python
from quantllm import turbo

model = turbo(
    "meta-llama/Llama-3.2-3B",
    config={"format": "gguf", "quantization": "Q4_K_M", "push_format": "gguf"},
)

# Push with auto-generated model card
model.push(
    "your-username/my-model",
    token="hf_..."
)
```

---

## Setup

### Get Your Token

1. Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
2. Create a new token with "Write" permissions
3. Use it in your code or set as environment variable

```bash
# Option 1: Environment variable
export HF_TOKEN=hf_your_token_here

# Option 2: Pass directly
model.push("user/repo", token="hf_...")
```

---

## Push Methods

### Method 1: TurboModel.push() (Recommended)

```python
from quantllm import turbo

model = turbo(
    "meta-llama/Llama-3.2-3B",
    config={
        "format": "gguf",
        "quantization": "Q4_K_M",
        "push_format": "gguf",
    },
)

# Uses shared config defaults
model.export()
model.push("your-username/my-model-gguf", license="apache-2.0")
```

### Method 2: QuantLLMHubManager (Advanced)

For more control:

```python
from quantllm import turbo, QuantLLMHubManager

model = turbo("meta-llama/Llama-3.2-3B")

# Create manager
manager = QuantLLMHubManager(
    repo_id="your-username/my-model",
    hf_token="hf_..."
)

# Track hyperparameters (for fine-tuned models)
manager.track_hyperparameters({
    "epochs": 3,
    "learning_rate": 2e-4,
    "lora_r": 16,
    "base_model": "meta-llama/Llama-3.2-3B"
})

# Save model
manager.save_final_model(model)

# Push
manager.push(commit_message="Upload fine-tuned model")
```

---

## Auto-Generated Model Cards

QuantLLM automatically generates professional model cards with:

### YAML Frontmatter

```yaml
---
license: apache-2.0
base_model: meta-llama/Llama-3.2-3B
library_name: gguf
language:
  - en
tags:
  - quantllm
  - gguf
  - llama-cpp
  - q4_k_m
---
```

### Format-Specific Usage Examples

For **GGUF** models:
```python
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="your-username/my-model",
    filename="model.Q4_K_M.gguf",
)
```

For **MLX** models:
```python
from mlx_lm import load, generate

model, tokenizer = load("your-username/my-model")
text = generate(model, tokenizer, prompt="Hello!")
```

For **ONNX** models:
```python
from optimum.onnxruntime import ORTModelForCausalLM

model = ORTModelForCausalLM.from_pretrained("your-username/my-model")
```

---

## Pull Models

Load models from HuggingFace:

```python
from quantllm import turbo, TurboModel

# Load regular models
model = turbo("your-username/my-model")

# Load GGUF models
model = TurboModel.from_gguf(
    "TheBloke/Llama-2-7B-Chat-GGUF",
    filename="llama-2-7b-chat.Q4_K_M.gguf"
)
```

### List GGUF Files

```python
files = TurboModel.list_gguf_files("TheBloke/Llama-2-7B-Chat-GGUF")
print(files)
# ['llama-2-7b-chat.Q2_K.gguf', 'llama-2-7b-chat.Q4_K_M.gguf', ...]
```

---

## Private Repositories

```python
# Push to private repo
model.push(
    "your-username/private-model",
    token="hf_...",
    private=True  # Makes repository private
)
```

---

## Fine-Tuning with Hub Tracking

Automatically track training hyperparameters:

```python
from quantllm import turbo, QuantLLMHubManager

model = turbo("meta-llama/Llama-3.2-3B")
manager = QuantLLMHubManager("user/repo", "hf_token")

# Train with automatic tracking
model.finetune(
    "data.json",
    epochs=3,
    hub_manager=manager  # Tracks all hyperparameters
)

# Push with full history
manager.save_final_model(model)
manager.push()
```

---

## Commit Messages

Customize commit messages:

```python
model.push(
    "user/repo",
    commit_message="v2.0 - Improved accuracy on coding tasks"
)
```

---

## Multiple Formats

Upload multiple formats to the same repo:

```python
manager = QuantLLMHubManager("user/my-model", token)

# Export multiple formats to staging
model.export("gguf", f"{manager.staging_dir}/model.Q4_K_M.gguf", quantization="Q4_K_M")
model.export("gguf", f"{manager.staging_dir}/model.Q8_0.gguf", quantization="Q8_0")

manager.push(commit_message="Upload Q4_K_M and Q8_0 variants")
```

---

## Best Practices

1. **Use descriptive names**: `llama-3.2-3b-code-assistant-q4`
2. **Include format in name**: `-gguf`, `-onnx`, `-mlx`
3. **Add quantization**: `-q4_k_m`, `-8bit`
4. **Write good commit messages**: Describe what changed
5. **Test before pushing**: Verify the model works

---

## Troubleshooting

### Authentication Error

```python
# Make sure your token has write permissions
# Check at: huggingface.co/settings/tokens
```

### Repository Already Exists

```python
# Use exist_ok=True (default)
model.push("user/existing-repo")  # Will update existing repo
```

### Large File Issues

```bash
# Install git-lfs for large files
git lfs install
```

---

## Next Steps

- [GGUF Export →](gguf-export.md) — Learn about GGUF quantization
- [Fine-tuning →](finetuning.md) — Train your own model
- [API Reference →](../api/hub.md) — Full Hub API