Model Switching in SteadyText¶

SteadyText 1.0.0 introduces dynamic model switching, allowing you to use different language models without restarting your application or changing environment variables.

Overview¶

The model switching feature enables you to:

Use different models for different tasks - Choose smaller models for speed or larger models for quality
Switch models at runtime - No need to restart your application
Maintain deterministic outputs - Each model produces consistent results
Cache multiple models - Models are cached after first load for efficiency

Usage Methods¶

1. Using Size Parameter (New!)¶

The simplest way to choose a model based on your needs:

from steadytext import generate

# Quick, lightweight tasks
text = generate("Simple question", size="small")    # Uses Qwen3-0.6B

# Balanced performance (default)
text = generate("General task", size="medium")      # Uses Qwen3-1.7B

# Complex, high-quality output
text = generate("Complex analysis", size="large")   # Uses Qwen3-4B

2. Using the Model Registry¶

For more specific model selection:

from steadytext import generate

# Use a smaller, faster model
text = generate("Explain machine learning", model="qwen2.5-0.5b")

# Use a larger, more capable model
text = generate("Write a detailed essay", model="qwen2.5-7b")

Available models in the registry:

Model Name	Size	Use Case	Size Parameter
`qwen3-0.6b`	0.6B	Very fast, simple tasks	`small`
`qwen3-1.7b`	1.7B	Default, balanced performance	`medium`
`qwen3-4b`	4B	Better quality, slower	`large`
`qwen3-8b`	8B	High quality, resource intensive	-
`qwen2.5-0.5b`	0.5B	Fast, lightweight tasks	-
`qwen2.5-1.5b`	1.5B	Good balance of speed/quality	-
`qwen2.5-3b`	3B	Enhanced capabilities	-
`qwen2.5-7b`	7B	Best quality, slower	-

3. Using Custom Models¶

Specify any GGUF model from Hugging Face:

from steadytext import generate

# Use a custom model
text = generate(
    "Create a Python function",
    model_repo="Qwen/Qwen2.5-7B-Instruct-GGUF",
    model_filename="qwen2.5-7b-instruct-q8_0.gguf"
)

4. Using Environment Variables¶

Set default models via environment variables:

export STEADYTEXT_GENERATION_MODEL_REPO="Qwen/Qwen2.5-3B-Instruct-GGUF"
export STEADYTEXT_GENERATION_MODEL_FILENAME="qwen2.5-3b-instruct-q8_0.gguf"

Streaming Generation¶

Model switching works with streaming generation too:

from steadytext import generate_iter

# Stream with a specific model
for token in generate_iter("Tell me a story", model="qwen2.5-3b"):
    print(token, end="", flush=True)

Model Selection Guide¶

For Speed (0.5B - 1.5B models)¶

Use cases: Chat responses, simple completions, real-time applications
Recommended: qwen3-0.6b (size="small"), qwen2.5-0.5b, qwen2.5-1.5b
Trade-off: Faster generation, simpler outputs

For Balance (1.7B - 3B models)¶

Use cases: General text generation, summaries, explanations
Recommended: qwen3-1.7b (size="medium", default), qwen2.5-3b
Trade-off: Good quality with reasonable speed

For Quality (4B+ models)¶

Use cases: Complex reasoning, detailed content, creative writing
Recommended: qwen3-4b (size="large"), qwen2.5-7b, qwen3-8b
Trade-off: Best quality, slower generation

Performance Considerations¶

First Load: The first use of a model downloads it (if not cached) and loads it into memory
Model Caching: Once loaded, models remain in memory for fast switching
Memory Usage: Each loaded model uses RAM - consider your available resources
Determinism: All models maintain deterministic outputs with the same seed

Examples¶

Adaptive Model Selection¶

from steadytext import generate

def smart_generate(prompt, max_length=100):
    """Use different models based on task complexity."""
    if max_length < 50:
        # Use fast model for short outputs
        return generate(prompt, model="qwen2.5-0.5b")
    elif max_length < 200:
        # Use balanced model
        return generate(prompt, model="qwen3-1.7b")
    else:
        # Use high-quality model for long outputs
        return generate(prompt, model="qwen2.5-7b")

A/B Testing Models¶

from steadytext import generate

prompts = ["Explain quantum computing", "Write a haiku", "Solve 2+2"]

for prompt in prompts:
    print(f"\nPrompt: {prompt}")

    # Test with small model
    small = generate(prompt, model="qwen2.5-0.5b")
    print(f"Small model: {small[:100]}...")

    # Test with large model
    large = generate(prompt, model="qwen2.5-3b")
    print(f"Large model: {large[:100]}...")

Troubleshooting¶

Model Not Found¶

If a model download fails, you'll get deterministic fallback text. Check: - Internet connection - Hugging Face availability - Model name spelling

Out of Memory¶

Large models require significant RAM. Solutions: - Use smaller quantized models - Clear model cache with clear_model_cache() - Use one model at a time

Slow First Load¶

Initial model loading takes time due to: - Downloading (first time only) - Loading into memory - Model initialization

Subsequent uses are much faster as models are cached.