--- license: apache-2.0 language: - en pipeline_tag: text-generation library_name: transformers model_type: causal-lm base_model: Qwen/Qwen3-4B tags: - reasoning - tree-of-thoughts - gnn - self-improving - autonomous-training - multi-agent - variance-curriculum - reinforcement-learning - Trident datasets: - gsm8k - mmlu - gpqa - arc-challenge - truthfulqa metrics: - accuracy inference: true training: true --- # TRIDENT **TRIDENT** is a reasoning-focused 4B-parameter language model that improves its own reasoning capability through **algorithmic self-improvement**, rather than parameter scaling. The model is built on **Qwen3-4B** and enhanced using the **TRIDENT framework**: a combination of GNN-guided Tree-of-Thoughts search, multi-agent reasoning policies, and variance-based self-training. --- ## Overview Traditional large language model training depends on: - Human-written reasoning traces - Manually curated preference datasets - Static fine-tuning pipelines **TRIDENT removes these dependencies.** Instead, the model: 1. Explores multiple reasoning paths 2. Evaluates them using a learned GNN policy 3. Selects high-uncertainty problems automatically 4. Generates its own training supervision 5. Distills improvements back into the model using LoRA --- model-index: - name: TRIDENT results: - task: type: text-generation dataset: name: GSM8K type: gsm8k split: test metrics: - type: accuracy value: 86.58 - task: type: text-generation dataset: name: MMLU type: mmlu split: test metrics: - type: accuracy value: 72.61 - task: type: text-generation dataset: name: GPQA type: gpqa split: test metrics: - type: accuracy value: 42.42 - task: type: text-generation dataset: name: ARC-Challenge type: arc-challenge split: test metrics: - type: accuracy value: 59.0 ## Core Capabilities ### GNN-Guided Tree-of-Thoughts Reasoning is represented as a directed graph of intermediate states. A 3-layer Graph Convolutional Network predicts a **promise score** for each branch, guiding exploration and pruning. ### Multi-Agent Reasoning Four internal agents (Conservative, Exploratory, Balanced, Reflective) vote on reasoning actions to balance exploration and correctness. ### Variance-Based Curriculum Problems are selected for training based on **reward variance**, targeting examples where the model is inconsistent and learning signal is highest. ### Self-Generative Reasoning Loop No human-annotated reasoning traces are used. The model autonomously generates, evaluates, and curates its own reasoning data. ### Stable Training A multi-layer reward stabilization mechanism prevents: - Reward collapse - Loss explosions - Infinite failure loops The architecture is compatible with future GRPO-style reinforcement learning. --- --- --- ## Benchmark Results Accuracy comparison against the base model: | Benchmark | Qwen3-4B | TRIDENT | |--------|--------|-----------| | GSM8K (5-shot) | 74.14 | **86.58** | | MMLU (5-shot) | 47.70 | **72.61** | | ARC-C (25-shot) | 54.0 | **59.0** | | GPQA (0-shot) | 28.28 | **42.42** | | Winogrande (0-shot) | 59.6 | **67.08** | | TruthfulQA (0-shot) | 54.9 | **54.7** | **Highlight:** +14.14 percentage point improvement on **GPQA (0-shot)**. --- ## Intended Use TRIDENT is suitable for: - Multi-step mathematical reasoning - Scientific and logical inference - Hard QA benchmarks - Planning and hypothesis exploration - Research on reasoning systems --- ## Limitations - Higher inference-time compute than single-pass models - Not optimized for low-latency chat - Best used where reasoning depth matters more than speed --- ## Ethical Considerations - No human-written reasoning traces used - No preference data collection - Training relies on verifiable task rewards - Like all LLMs, may hallucinate outside structured reasoning workflows --- ## Paper link https://www.shivik.in/shivik-labs/trident ## Citation ```bibtex @article{puri2025trident, title={TRIDENT: Thought-based Reasoning and Improvement through Deep Exploration of Neuronal Trees}, author={Puri, Shivansh and Khandelwal, Abhisek and Joshi, Vedant and Yadav, Akash}, year={2025} }