Instructions to use MatteoKhan/pythia-70m-hybrid with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MatteoKhan/pythia-70m-hybrid with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MatteoKhan/pythia-70m-hybrid")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MatteoKhan/pythia-70m-hybrid")
model = AutoModelForCausalLM.from_pretrained("MatteoKhan/pythia-70m-hybrid")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MatteoKhan/pythia-70m-hybrid with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MatteoKhan/pythia-70m-hybrid"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MatteoKhan/pythia-70m-hybrid",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/MatteoKhan/pythia-70m-hybrid

SGLang

How to use MatteoKhan/pythia-70m-hybrid with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MatteoKhan/pythia-70m-hybrid" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MatteoKhan/pythia-70m-hybrid",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MatteoKhan/pythia-70m-hybrid" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MatteoKhan/pythia-70m-hybrid",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use MatteoKhan/pythia-70m-hybrid with Docker Model Runner:
```
docker model run hf.co/MatteoKhan/pythia-70m-hybrid
```

🚀 Pythia-Hybrid-140M: Merging Efficiency & Power

📌 Overview

Pythia-Hybrid-140M is an experimental hybrid language model that merges the capabilities of two Pythia variants. Built using MergeKit, this model is designed to balance performance and efficiency while offering strong text generation capabilities.

🔗 Created by: Matteo Khan
🎓 Affiliation: Apprentice at TW3 Partners (Generative AI Research)
📍 License: MIT

🔗 Connect with me on LinkedIn
🔍 Model on Hugging Face

🧠 Model Details

Model Type: Hybrid Language Model (Merged)
Parent Models:
- Pythia-70M
- Pythia-70M-Deduped
Merging Technique: Linear Merge (MergeKit)

🎯 Intended Use

This model is primarily intended for research and experimentation in hybrid model optimization. Potential use cases include:

✅ Text Generation
✅ Conversational AI
✅ Creative Writing Assistance
✅ Exploration of Model Merging Effects

⚠️ Limitations & Considerations

While Pythia-Hybrid-140M offers enhanced capabilities, it also inherits certain limitations from its parent models:

❌ May generate inaccurate or misleading information
⚠️ Potential for biased, offensive, or harmful content
🔄 Merging may introduce unpredictable behaviors
📉 Performance may vary across different tasks

🔬 Merging Process & Configuration

This is not a newly trained model, but rather a merge of existing models using the following configuration:

merge_method: linear
dtype: float16
models:
  - model: "EleutherAI/pythia-70m"
    parameters:
      t: 1.0
      weight: 0.5
  - model: "EleutherAI/pythia-70m-deduped"
    parameters:
      t: 1.0
      weight: 0.5
parameters:
  normalize: true
  int8_mask: false
layers:
  - pattern: "model.*"

📊 No formal evaluation has been conducted yet. Users are encouraged to benchmark and share feedback!

🌍 Environmental Impact

By utilizing model merging rather than training from scratch, Pythia-Hybrid-140M significantly reduces computational and environmental costs.

🚀 How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "MatteoKhan/Pythia-Hybrid-140M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Example usage
prompt = "Write a short poem about artificial intelligence."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

📝 Pythia-70M

@misc{biderman2023pythia,
      title={Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling},
      author={Stella Biderman et al.},
      year={2023},
      eprint={2304.01373},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

📩 Feedback & Contact: Reach out via Hugging Face.

🎉 Happy Experimenting! 🚀

Downloads last month: 2

Safetensors

Model size

70.4M params

Tensor type

F32

Model tree for MatteoKhan/pythia-70m-hybrid

Base model

EleutherAI/pythia-70m

Finetuned

(220)

this model

Paper for MatteoKhan/pythia-70m-hybrid

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 9