--- license: mit tags: - lstm - language-model - word-level - tinystories - onnx - small-model datasets: - roneneldan/TinyStories language: - en pipeline_tag: text-generation new_version: phmd/TinyStories-SRL-5M --- # TinyStories Word-Level LSTM (ONNX) A compact **10.9 MB** word-level LSTM language model trained on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset. Generates short, coherent children's stories with minimal compute. - **Vocab size**: 5,004 (word-level, includes ``, ``, ``, ``) - **Architecture**: 2-layer LSTM, 256 hidden units, 128-dim embeddings - **Max sequence length**: 50 tokens - **Format**: ONNX (compatible with ONNX Runtime on CPU/GPU) - **Model size**: ~11 MB Trained on 500k TinyStories samples in under 5 minutes on a T4 GPU (x2). ## Usage ### 1. Install dependencies ```bash pip install onnxruntime numpy ``` ### 2. Download files from this repo - [`tinystories_lstm.onnx`](https://huggingface.co/phmd/TinyStories-LSTM-5.5M/resolve/main/tinystories_lstm.onnx) - [`vocab.txt`](https://huggingface.co/phmd/TinyStories-LSTM-5.5M/resolve/main/vocab.txt) ### 3. Run inference ```python import numpy as np import onnxruntime as ort # --- Load vocabulary --- with open("vocab.txt", "r") as f: vocab = [line.strip() for line in f] word2idx = {word: idx for idx, word in enumerate(vocab)} idx2word = {idx: word for word, idx in word2idx.items()} # Special tokens SOS_IDX = word2idx[""] EOS_IDX = word2idx[""] PAD_IDX = word2idx[""] UNK_IDX = word2idx[""] # --- Tokenizer (simple word-level) --- def tokenize(text): import re # Lowercase and split punctuation text = re.sub(r'([.,!?])', r' \1 ', text.lower()) return text.split() # --- Load ONNX model --- ort_session = ort.InferenceSession("tinystories_lstm.onnx") # --- Text generation function --- def generate_text(prompt, max_new_tokens=30, temperature=0.8): # Tokenize prompt tokens = tokenize(prompt) input_ids = [SOS_IDX] + [ word2idx.get(t, UNK_IDX) for t in tokens ] # Pad to length 1 (we'll generate autoregressively) current_seq = input_ids.copy() for _ in range(max_new_tokens): # Pad current sequence to length 50 (model expects fixed length) padded = current_seq + [PAD_IDX] * (50 - len(current_seq)) if len(padded) > 50: padded = padded[-50:] # Truncate if too long input_tensor = np.array([padded], dtype=np.int64) # Run model outputs = ort_session.run(None, {"input": input_tensor}) logits = outputs[0] # Shape: (1, 50, 5004) # Get logits for last non-pad token last_pos = min(len(current_seq) - 1, 49) next_token_logits = logits[0, last_pos, :] / temperature # Softmax + sampling probs = np.exp(next_token_logits - np.max(next_token_logits)) probs = probs / np.sum(probs) next_token = np.random.choice(len(probs), p=probs) if next_token == EOS_IDX: break current_seq.append(next_token) # Decode words = [idx2word[idx] for idx in current_seq[1:] if idx != PAD_IDX] return " ".join(words).replace(" .", ".").replace(" ,", ",") # --- Example usage --- if __name__ == "__main__": prompt = "once upon a time" story = generate_text(prompt, max_new_tokens=40, temperature=0.7) print(f"Prompt: {prompt}") print(f"Story: {story}") ``` ### Example Output ``` Prompt: once upon a time Story: once upon a time there was a little girl named lily. she loved to play in the garden. one day she found a magic flower that could talk! ``` ## Training Details - Dataset: `roneneldan/TinyStories` (500k training samples) - Optimizer: Adam (lr=0.002) - Batch size: 128 - Epochs: 2 - Hardware: NVIDIA T4 (x2) - Training time: ~5 minutes ## Limitations - Word-level modeling → cannot handle out-of-vocabulary words well - Fixed context window (50 tokens) - No beam search (uses basic sampling)