π€ Hub APIο
Push models to HuggingFace Hub with auto-generated model cards.
Quick Referenceο
from quantllm import turbo, QuantLLMHubManager
# Method 1: TurboModel.push() (Recommended)
model = turbo(
"meta-llama/Llama-3.2-3B",
config={"format": "gguf", "quantization": "Q4_K_M", "push_format": "gguf"},
)
model.push("user/my-model")
# Method 2: QuantLLMHubManager (Advanced)
manager = QuantLLMHubManager("user/my-model", hf_token="hf_...")
manager.save_final_model(model)
manager.push()
TurboModel.push()ο
The simplest way to push models.
def push(
self,
repo_id: str,
token: Optional[str] = None,
format: Optional[str] = None,
quantization: Optional[str] = None,
license: str = "apache-2.0",
commit_message: str = "Upload model via QuantLLM",
**kwargs
)
Parametersο
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
str |
required |
HuggingFace repo ID (user/model) |
|
str |
None |
HF token (or use HF_TOKEN env) |
|
str |
None |
Export format (uses |
|
str |
None |
Quantization type |
|
str |
βapache-2.0β |
License type |
Supported Formatsο
Format |
Description |
|---|---|
|
HuggingFace Transformers (default) |
|
llama.cpp, Ollama, LM Studio |
|
ONNX Runtime, TensorRT |
|
Apple Silicon (M1/M2/M3/M4) |
Examplesο
from quantllm import turbo
model = turbo(
"meta-llama/Llama-3.2-3B",
config={"format": "gguf", "quantization": "Q4_K_M", "push_format": "gguf"},
)
# Push as GGUF
model.push(
"your-username/llama-3.2-3b-gguf"
)
# Push as ONNX
model.push(
"your-username/llama-3.2-3b-onnx",
format="onnx"
)
# Push as MLX
model.push(
"your-username/llama-3.2-3b-mlx",
format="mlx",
quantization="4bit"
)
# Push as SafeTensors (default)
model.push("your-username/llama-3.2-3b")
QuantLLMHubManagerο
Advanced hub management with hyperparameter tracking.
class QuantLLMHubManager:
def __init__(
self,
repo_id: str,
hf_token: Optional[str] = None,
organization: Optional[str] = None
)
Parametersο
Parameter |
Type |
Description |
|---|---|---|
|
str |
HuggingFace repo ID (user/model) |
|
str |
HuggingFace API token |
|
str |
Optional organization name |
Methodsο
login()ο
Verify authentication with HuggingFace.
manager.login()
track_hyperparameters()ο
Track training hyperparameters for the model card.
def track_hyperparameters(self, params: Dict[str, Any])
Example:
manager.track_hyperparameters({
"epochs": 3,
"learning_rate": 2e-4,
"lora_r": 16,
"base_model": "meta-llama/Llama-3.2-3B",
})
save_final_model()ο
Save model to staging directory.
def save_final_model(
self,
model,
tokenizer=None,
format: str = "safetensors"
)
push()ο
Push staged model to HuggingFace Hub.
def push(self, commit_message: str = "Upload model via QuantLLM")
Complete Workflowο
Fine-Tune and Pushο
from quantllm import turbo, QuantLLMHubManager
# Load model
model = turbo("meta-llama/Llama-3.2-3B")
# Create manager
manager = QuantLLMHubManager(
"your-username/my-finetuned-model",
hf_token="hf_..."
)
# Fine-tune with tracking
model.finetune(
"data.json",
epochs=3,
hub_manager=manager # Auto-tracks hyperparameters
)
# Save and push
manager.save_final_model(model)
manager.push(commit_message="Fine-tuned on custom dataset")
Export and Pushο
from quantllm import turbo, QuantLLMHubManager
import os
model = turbo("meta-llama/Llama-3.2-3B")
manager = QuantLLMHubManager("your-username/my-gguf", "hf_...")
# Export multiple quantizations
for quant in ["Q4_K_M", "Q5_K_M", "Q8_0"]:
output = os.path.join(manager.staging_dir, f"model.{quant}.gguf")
model.export("gguf", output, quantization=quant)
# Track metadata
manager.track_hyperparameters({
"format": "gguf",
"base_model": "meta-llama/Llama-3.2-3B",
"quantizations": ["Q4_K_M", "Q5_K_M", "Q8_0"],
})
manager.push()
Auto-Generated Model Cardsο
QuantLLM automatically generates professional model cards with:
YAML Frontmatterο
---
license: apache-2.0
base_model: meta-llama/Llama-3.2-3B
library_name: gguf
language:
- en
tags:
- quantllm
- gguf
- llama-cpp
- q4_k_m
---
Format-Specific Usageο
For GGUF:
from llama_cpp import Llama
llm = Llama.from_pretrained(repo_id="user/model", filename="model.Q4_K_M.gguf")
For MLX:
from mlx_lm import load, generate
model, tokenizer = load("user/model")
text = generate(model, tokenizer, prompt="Hello!")
For ONNX:
from optimum.onnxruntime import ORTModelForCausalLM
model = ORTModelForCausalLM.from_pretrained("user/model")
ModelCardGeneratorο
Generate custom model cards.
from quantllm.hub import ModelCardGenerator, generate_model_card
# Quick function
content = generate_model_card(
repo_id="user/my-model",
base_model="meta-llama/Llama-3.2-3B",
format="gguf",
quantization="Q4_K_M",
license="apache-2.0",
)
# Or use the class for more control
generator = ModelCardGenerator(
repo_id="user/my-model",
base_model="meta-llama/Llama-3.2-3B",
format="gguf",
quantization="Q4_K_M",
license="apache-2.0",
language=["en", "es"],
tags=["finetuned", "code"],
)
content = generator.generate()
Environment Variablesο
Variable |
Description |
|---|---|
|
HuggingFace API token |
|
Alternative token variable |
|
Disable progress bars |
Best Practicesο
Use descriptive names:
llama-3.2-3b-code-q4_k_mInclude format suffix:
-gguf,-onnx,-mlxTest before pushing: Verify the model works
Use appropriate license: Match your base modelβs license
Write good commit messages: Describe what changed
See Alsoο
Hub Integration Guide β Detailed guide
TurboModel.push() β Push via TurboModel
GGUF Export β GGUF conversion details