π€ Hub Integrationο
Push and pull models from HuggingFace Hub with auto-generated model cards.
Quick Pushο
The easiest way to share your model:
from quantllm import turbo
model = turbo(
"meta-llama/Llama-3.2-3B",
config={"format": "gguf", "quantization": "Q4_K_M", "push_format": "gguf"},
)
# Push with auto-generated model card
model.push(
"your-username/my-model",
token="hf_..."
)
Setupο
Get Your Tokenο
Create a new token with βWriteβ permissions
Use it in your code or set as environment variable
# Option 1: Environment variable
export HF_TOKEN=hf_your_token_here
# Option 2: Pass directly
model.push("user/repo", token="hf_...")
Push Methodsο
Method 1: TurboModel.push() (Recommended)ο
from quantllm import turbo
model = turbo(
"meta-llama/Llama-3.2-3B",
config={
"format": "gguf",
"quantization": "Q4_K_M",
"push_format": "gguf",
},
)
# Uses shared config defaults
model.export()
model.push("your-username/my-model-gguf", license="apache-2.0")
Method 2: QuantLLMHubManager (Advanced)ο
For more control:
from quantllm import turbo, QuantLLMHubManager
model = turbo("meta-llama/Llama-3.2-3B")
# Create manager
manager = QuantLLMHubManager(
repo_id="your-username/my-model",
hf_token="hf_..."
)
# Track hyperparameters (for fine-tuned models)
manager.track_hyperparameters({
"epochs": 3,
"learning_rate": 2e-4,
"lora_r": 16,
"base_model": "meta-llama/Llama-3.2-3B"
})
# Save model
manager.save_final_model(model)
# Push
manager.push(commit_message="Upload fine-tuned model")
Auto-Generated Model Cardsο
QuantLLM automatically generates professional model cards with:
YAML Frontmatterο
---
license: apache-2.0
base_model: meta-llama/Llama-3.2-3B
library_name: gguf
language:
- en
tags:
- quantllm
- gguf
- llama-cpp
- q4_k_m
---
Format-Specific Usage Examplesο
For GGUF models:
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="your-username/my-model",
filename="model.Q4_K_M.gguf",
)
For MLX models:
from mlx_lm import load, generate
model, tokenizer = load("your-username/my-model")
text = generate(model, tokenizer, prompt="Hello!")
For ONNX models:
from optimum.onnxruntime import ORTModelForCausalLM
model = ORTModelForCausalLM.from_pretrained("your-username/my-model")
Pull Modelsο
Load models from HuggingFace:
from quantllm import turbo, TurboModel
# Load regular models
model = turbo("your-username/my-model")
# Load GGUF models
model = TurboModel.from_gguf(
"TheBloke/Llama-2-7B-Chat-GGUF",
filename="llama-2-7b-chat.Q4_K_M.gguf"
)
List GGUF Filesο
files = TurboModel.list_gguf_files("TheBloke/Llama-2-7B-Chat-GGUF")
print(files)
# ['llama-2-7b-chat.Q2_K.gguf', 'llama-2-7b-chat.Q4_K_M.gguf', ...]
Private Repositoriesο
# Push to private repo
model.push(
"your-username/private-model",
token="hf_...",
private=True # Makes repository private
)
Fine-Tuning with Hub Trackingο
Automatically track training hyperparameters:
from quantllm import turbo, QuantLLMHubManager
model = turbo("meta-llama/Llama-3.2-3B")
manager = QuantLLMHubManager("user/repo", "hf_token")
# Train with automatic tracking
model.finetune(
"data.json",
epochs=3,
hub_manager=manager # Tracks all hyperparameters
)
# Push with full history
manager.save_final_model(model)
manager.push()
Commit Messagesο
Customize commit messages:
model.push(
"user/repo",
commit_message="v2.0 - Improved accuracy on coding tasks"
)
Multiple Formatsο
Upload multiple formats to the same repo:
manager = QuantLLMHubManager("user/my-model", token)
# Export multiple formats to staging
model.export("gguf", f"{manager.staging_dir}/model.Q4_K_M.gguf", quantization="Q4_K_M")
model.export("gguf", f"{manager.staging_dir}/model.Q8_0.gguf", quantization="Q8_0")
manager.push(commit_message="Upload Q4_K_M and Q8_0 variants")
Best Practicesο
Use descriptive names:
llama-3.2-3b-code-assistant-q4Include format in name:
-gguf,-onnx,-mlxAdd quantization:
-q4_k_m,-8bitWrite good commit messages: Describe what changed
Test before pushing: Verify the model works
Troubleshootingο
Authentication Errorο
# Make sure your token has write permissions
# Check at: huggingface.co/settings/tokens
Repository Already Existsο
# Use exist_ok=True (default)
model.push("user/existing-repo") # Will update existing repo
Large File Issuesο
# Install git-lfs for large files
git lfs install
Next Stepsο
GGUF Export β β Learn about GGUF quantization
Fine-tuning β β Train your own model
API Reference β β Full Hub API