# 💬 Text Generation Generate text with various options and modes. --- ## Basic Generation ```python from quantllm import turbo model = turbo("meta-llama/Llama-3.2-3B") response = model.generate("What is machine learning?") print(response) ``` --- ## Generation Parameters ### Temperature & Sampling ```python response = model.generate( "Write a creative story about a robot.", max_new_tokens=200, # Maximum tokens to generate temperature=0.7, # Creativity (0.0 = deterministic, 1.0+ = creative) top_p=0.9, # Nucleus sampling (higher = more diverse) top_k=50, # Top-k sampling do_sample=True, # Enable sampling (required for temperature > 0) ) ``` ### Controlling Output ```python response = model.generate( "List 5 programming languages:", max_new_tokens=100, repetition_penalty=1.1, # Prevent repetition (1.0 = off, 1.2 = strong) no_repeat_ngram_size=3, # Prevent repeating n-grams ) ``` ### Parameter Guide | Parameter | Range | Description | |-----------|-------|-------------| | `temperature` | 0.0-2.0 | 0.1-0.3 for factual, 0.7-0.9 for creative | | `top_p` | 0.0-1.0 | 0.9 is a good default | | `top_k` | 1-100 | 50 is a good default | | `repetition_penalty` | 1.0-1.5 | 1.1-1.2 prevents repetition | | `max_new_tokens` | 1-4096+ | Depends on model context length | --- ## Chat Mode For conversational models with system prompts: ```python messages = [ {"role": "system", "content": "You are a helpful coding assistant."}, {"role": "user", "content": "How do I read a file in Python?"}, ] response = model.chat(messages, max_new_tokens=200) print(response) ``` ### Multi-Turn Conversation ```python messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is Python?"}, ] # First response response = model.chat(messages) print(f"Assistant: {response}") # Continue conversation messages.append({"role": "assistant", "content": response}) messages.append({"role": "user", "content": "What about JavaScript?"}) response = model.chat(messages) print(f"Assistant: {response}") ``` --- ## Streaming Get tokens as they're generated for better UX: ```python # Streaming generation for token in model.generate("Write a poem about the ocean:", stream=True): print(token, end="", flush=True) print() # Newline at end ``` ### Streaming with Chat ```python messages = [{"role": "user", "content": "Tell me a story."}] for token in model.chat(messages, stream=True): print(token, end="", flush=True) ``` --- ## Stop Strings Stop generation at specific patterns: ```python response = model.generate( "Write a haiku:\n", max_new_tokens=100, stop_strings=["---", "\n\n\n"], # Stop at these patterns ) ``` --- ## Batch Generation Generate multiple responses efficiently: ```python prompts = [ "What is Python?", "What is JavaScript?", "What is Rust?", ] for prompt in prompts: response = model.generate(prompt, max_new_tokens=100) print(f"Q: {prompt}") print(f"A: {response}\n") ``` --- ## Common Use Cases ### Factual Q&A ```python response = model.generate( "What is the capital of France?", temperature=0.1, # Low temperature for factual max_new_tokens=50, ) ``` ### Creative Writing ```python response = model.generate( "Write a short story about a dragon:", temperature=0.8, # Higher temperature for creativity top_p=0.95, max_new_tokens=500, ) ``` ### Code Generation ```python response = model.generate( "Write a Python function to sort a list:", temperature=0.2, # Low for accurate code max_new_tokens=200, ) ``` ### Summarization ```python text = "..." # Long text to summarize response = model.generate( f"Summarize the following text:\n\n{text}\n\nSummary:", temperature=0.3, max_new_tokens=150, ) ``` --- ## Best Practices 1. **Temperature**: Use 0.1-0.3 for factual, 0.7-0.9 for creative 2. **Max tokens**: Set reasonable limits to avoid runaway generation 3. **Repetition penalty**: Use 1.1-1.2 to reduce repetition 4. **Streaming**: Use for long responses to improve user experience 5. **Stop strings**: Define clear stopping points for structured output --- ## Next Steps - [Fine-tuning →](finetuning.md) — Train the model on your data - [GGUF Export →](gguf-export.md) — Export for deployment - [API Reference →](../api/turbomodel.md) — Full API documentation