πŸ€— Hub API

Push models to HuggingFace Hub with auto-generated model cards.


Quick Reference

from quantllm import turbo, QuantLLMHubManager

# Method 1: TurboModel.push() (Recommended)
model = turbo(
    "meta-llama/Llama-3.2-3B",
    config={"format": "gguf", "quantization": "Q4_K_M", "push_format": "gguf"},
)
model.push("user/my-model")

# Method 2: QuantLLMHubManager (Advanced)
manager = QuantLLMHubManager("user/my-model", hf_token="hf_...")
manager.save_final_model(model)
manager.push()

TurboModel.push()

The simplest way to push models.

def push(
    self,
    repo_id: str,
    token: Optional[str] = None,
    format: Optional[str] = None,
    quantization: Optional[str] = None,
    license: str = "apache-2.0",
    commit_message: str = "Upload model via QuantLLM",
    **kwargs
)

Parameters

Parameter

Type

Default

Description

repo_id

str

required

HuggingFace repo ID (user/model)

token

str

None

HF token (or use HF_TOKEN env)

format

str

None

Export format (uses config["push_format"] when omitted)

quantization

str

None

Quantization type

license

str

β€œapache-2.0”

License type

Supported Formats

Format

Description

"safetensors"

HuggingFace Transformers (default)

"gguf"

llama.cpp, Ollama, LM Studio

"onnx"

ONNX Runtime, TensorRT

"mlx"

Apple Silicon (M1/M2/M3/M4)

Examples

from quantllm import turbo

model = turbo(
    "meta-llama/Llama-3.2-3B",
    config={"format": "gguf", "quantization": "Q4_K_M", "push_format": "gguf"},
)

# Push as GGUF
model.push(
    "your-username/llama-3.2-3b-gguf"
)

# Push as ONNX
model.push(
    "your-username/llama-3.2-3b-onnx",
    format="onnx"
)

# Push as MLX
model.push(
    "your-username/llama-3.2-3b-mlx",
    format="mlx",
    quantization="4bit"
)

# Push as SafeTensors (default)
model.push("your-username/llama-3.2-3b")

QuantLLMHubManager

Advanced hub management with hyperparameter tracking.

class QuantLLMHubManager:
    def __init__(
        self,
        repo_id: str,
        hf_token: Optional[str] = None,
        organization: Optional[str] = None
    )

Parameters

Parameter

Type

Description

repo_id

str

HuggingFace repo ID (user/model)

hf_token

str

HuggingFace API token

organization

str

Optional organization name

Methods

login()

Verify authentication with HuggingFace.

manager.login()

track_hyperparameters()

Track training hyperparameters for the model card.

def track_hyperparameters(self, params: Dict[str, Any])

Example:

manager.track_hyperparameters({
    "epochs": 3,
    "learning_rate": 2e-4,
    "lora_r": 16,
    "base_model": "meta-llama/Llama-3.2-3B",
})

save_final_model()

Save model to staging directory.

def save_final_model(
    self,
    model,
    tokenizer=None,
    format: str = "safetensors"
)

push()

Push staged model to HuggingFace Hub.

def push(self, commit_message: str = "Upload model via QuantLLM")

Complete Workflow

Fine-Tune and Push

from quantllm import turbo, QuantLLMHubManager

# Load model
model = turbo("meta-llama/Llama-3.2-3B")

# Create manager
manager = QuantLLMHubManager(
    "your-username/my-finetuned-model",
    hf_token="hf_..."
)

# Fine-tune with tracking
model.finetune(
    "data.json",
    epochs=3,
    hub_manager=manager  # Auto-tracks hyperparameters
)

# Save and push
manager.save_final_model(model)
manager.push(commit_message="Fine-tuned on custom dataset")

Export and Push

from quantllm import turbo, QuantLLMHubManager
import os

model = turbo("meta-llama/Llama-3.2-3B")
manager = QuantLLMHubManager("your-username/my-gguf", "hf_...")

# Export multiple quantizations
for quant in ["Q4_K_M", "Q5_K_M", "Q8_0"]:
    output = os.path.join(manager.staging_dir, f"model.{quant}.gguf")
    model.export("gguf", output, quantization=quant)

# Track metadata
manager.track_hyperparameters({
    "format": "gguf",
    "base_model": "meta-llama/Llama-3.2-3B",
    "quantizations": ["Q4_K_M", "Q5_K_M", "Q8_0"],
})

manager.push()

Auto-Generated Model Cards

QuantLLM automatically generates professional model cards with:

YAML Frontmatter

---
license: apache-2.0
base_model: meta-llama/Llama-3.2-3B
library_name: gguf
language:
  - en
tags:
  - quantllm
  - gguf
  - llama-cpp
  - q4_k_m
---

Format-Specific Usage

For GGUF:

from llama_cpp import Llama
llm = Llama.from_pretrained(repo_id="user/model", filename="model.Q4_K_M.gguf")

For MLX:

from mlx_lm import load, generate
model, tokenizer = load("user/model")
text = generate(model, tokenizer, prompt="Hello!")

For ONNX:

from optimum.onnxruntime import ORTModelForCausalLM
model = ORTModelForCausalLM.from_pretrained("user/model")

ModelCardGenerator

Generate custom model cards.

from quantllm.hub import ModelCardGenerator, generate_model_card

# Quick function
content = generate_model_card(
    repo_id="user/my-model",
    base_model="meta-llama/Llama-3.2-3B",
    format="gguf",
    quantization="Q4_K_M",
    license="apache-2.0",
)

# Or use the class for more control
generator = ModelCardGenerator(
    repo_id="user/my-model",
    base_model="meta-llama/Llama-3.2-3B",
    format="gguf",
    quantization="Q4_K_M",
    license="apache-2.0",
    language=["en", "es"],
    tags=["finetuned", "code"],
)
content = generator.generate()

Environment Variables

Variable

Description

HF_TOKEN

HuggingFace API token

HUGGINGFACE_TOKEN

Alternative token variable

HF_HUB_DISABLE_PROGRESS_BARS

Disable progress bars


Best Practices

  1. Use descriptive names: llama-3.2-3b-code-q4_k_m

  2. Include format suffix: -gguf, -onnx, -mlx

  3. Test before pushing: Verify the model works

  4. Use appropriate license: Match your base model’s license

  5. Write good commit messages: Describe what changed


See Also