---
license: mit
tags:
- lstm
- language-model
- word-level
- tinystories
- onnx
- small-model
datasets:
- roneneldan/TinyStories
language:
- en
pipeline_tag: text-generation
new_version: phmd/TinyStories-SRL-5M
---

# TinyStories Word-Level LSTM (ONNX)

A compact **10.9 MB** word-level LSTM language model trained on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset.  
Generates short, coherent children's stories with minimal compute.

- **Vocab size**: 5,004 (word-level, includes `<PAD>`, `<UNK>`, `<SOS>`, `<EOS>`)
- **Architecture**: 2-layer LSTM, 256 hidden units, 128-dim embeddings
- **Max sequence length**: 50 tokens
- **Format**: ONNX (compatible with ONNX Runtime on CPU/GPU)
- **Model size**: ~11 MB

Trained on 500k TinyStories samples in under 5 minutes on a T4 GPU (x2).

## Usage

### 1. Install dependencies
```bash
pip install onnxruntime numpy
```

### 2. Download files from this repo
- [`tinystories_lstm.onnx`](https://huggingface.co/phmd/TinyStories-LSTM-5.5M/resolve/main/tinystories_lstm.onnx)
- [`vocab.txt`](https://huggingface.co/phmd/TinyStories-LSTM-5.5M/resolve/main/vocab.txt)


### 3. Run inference

```python
import numpy as np
import onnxruntime as ort

# --- Load vocabulary ---
with open("vocab.txt", "r") as f:
    vocab = [line.strip() for line in f]
word2idx = {word: idx for idx, word in enumerate(vocab)}
idx2word = {idx: word for word, idx in word2idx.items()}

# Special tokens
SOS_IDX = word2idx["<SOS>"]
EOS_IDX = word2idx["<EOS>"]
PAD_IDX = word2idx["<PAD>"]
UNK_IDX = word2idx["<UNK>"]

# --- Tokenizer (simple word-level) ---
def tokenize(text):
    import re
    # Lowercase and split punctuation
    text = re.sub(r'([.,!?])', r' \1 ', text.lower())
    return text.split()

# --- Load ONNX model ---
ort_session = ort.InferenceSession("tinystories_lstm.onnx")

# --- Text generation function ---
def generate_text(prompt, max_new_tokens=30, temperature=0.8):
    # Tokenize prompt
    tokens = tokenize(prompt)
    input_ids = [SOS_IDX] + [
        word2idx.get(t, UNK_IDX) for t in tokens
    ]
    
    # Pad to length 1 (we'll generate autoregressively)
    current_seq = input_ids.copy()
    
    for _ in range(max_new_tokens):
        # Pad current sequence to length 50 (model expects fixed length)
        padded = current_seq + [PAD_IDX] * (50 - len(current_seq))
        if len(padded) > 50:
            padded = padded[-50:]  # Truncate if too long
        
        input_tensor = np.array([padded], dtype=np.int64)
        
        # Run model
        outputs = ort_session.run(None, {"input": input_tensor})
        logits = outputs[0]  # Shape: (1, 50, 5004)
        
        # Get logits for last non-pad token
        last_pos = min(len(current_seq) - 1, 49)
        next_token_logits = logits[0, last_pos, :] / temperature
        
        # Softmax + sampling
        probs = np.exp(next_token_logits - np.max(next_token_logits))
        probs = probs / np.sum(probs)
        next_token = np.random.choice(len(probs), p=probs)
        
        if next_token == EOS_IDX:
            break
        current_seq.append(next_token)
    
    # Decode
    words = [idx2word[idx] for idx in current_seq[1:] if idx != PAD_IDX]
    return " ".join(words).replace(" .", ".").replace(" ,", ",")

# --- Example usage ---
if __name__ == "__main__":
    prompt = "once upon a time"
    story = generate_text(prompt, max_new_tokens=40, temperature=0.7)
    print(f"Prompt: {prompt}")
    print(f"Story:  {story}")
```

### Example Output
```
Prompt: once upon a time
Story:  once upon a time there was a little girl named lily. she loved to play in the garden. one day she found a magic flower that could talk!
```

## Training Details
- Dataset: `roneneldan/TinyStories` (500k training samples)
- Optimizer: Adam (lr=0.002)
- Batch size: 128
- Epochs: 2
- Hardware: NVIDIA T4 (x2)
- Training time: ~5 minutes

## Limitations
- Word-level modeling → cannot handle out-of-vocabulary words well
- Fixed context window (50 tokens)
- No beam search (uses basic sampling)