---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
model_type: causal-lm
base_model: Qwen/Qwen3-4B
tags:
- reasoning
- tree-of-thoughts
- gnn
- self-improving
- autonomous-training
- multi-agent
- variance-curriculum
- reinforcement-learning
- Trident
datasets:
- gsm8k
- mmlu
- gpqa
- arc-challenge
- truthfulqa
metrics:
- accuracy
inference: true
training: true
---

# TRIDENT

**TRIDENT** is a reasoning-focused 4B-parameter language model that improves its own reasoning capability through **algorithmic self-improvement**, rather than parameter scaling.

The model is built on **Qwen3-4B** and enhanced using the **TRIDENT framework**: a combination of GNN-guided Tree-of-Thoughts search, multi-agent reasoning policies, and variance-based self-training.

---

## Overview

Traditional large language model training depends on:
- Human-written reasoning traces  
- Manually curated preference datasets  
- Static fine-tuning pipelines  

**TRIDENT removes these dependencies.**

Instead, the model:
1. Explores multiple reasoning paths
2. Evaluates them using a learned GNN policy
3. Selects high-uncertainty problems automatically
4. Generates its own training supervision
5. Distills improvements back into the model using LoRA

---
model-index:
- name: TRIDENT
  results:
  - task:
      type: text-generation
    dataset:
      name: GSM8K
      type: gsm8k
      split: test
    metrics:
    - type: accuracy
      value: 86.58
  - task:
      type: text-generation
    dataset:
      name: MMLU
      type: mmlu
      split: test
    metrics:
    - type: accuracy
      value: 72.61
  - task:
      type: text-generation
    dataset:
      name: GPQA
      type: gpqa
      split: test
    metrics:
    - type: accuracy
      value: 42.42
  - task:
      type: text-generation
    dataset:
      name: ARC-Challenge
      type: arc-challenge
      split: test
    metrics:
    - type: accuracy
      value: 59.0

## Core Capabilities

### GNN-Guided Tree-of-Thoughts
Reasoning is represented as a directed graph of intermediate states.  
A 3-layer Graph Convolutional Network predicts a **promise score** for each branch, guiding exploration and pruning.

### Multi-Agent Reasoning
Four internal agents (Conservative, Exploratory, Balanced, Reflective) vote on reasoning actions to balance exploration and correctness.

### Variance-Based Curriculum
Problems are selected for training based on **reward variance**, targeting examples where the model is inconsistent and learning signal is highest.

### Self-Generative Reasoning Loop
No human-annotated reasoning traces are used.  
The model autonomously generates, evaluates, and curates its own reasoning data.

### Stable Training
A multi-layer reward stabilization mechanism prevents:
- Reward collapse
- Loss explosions
- Infinite failure loops  

The architecture is compatible with future GRPO-style reinforcement learning.

---


---

---

## Benchmark Results

Accuracy comparison against the base model:

| Benchmark | Qwen3-4B | TRIDENT |
|--------|--------|-----------|
| GSM8K (5-shot) | 74.14 | **86.58** |
| MMLU (5-shot) | 47.70 | **72.61** |
| ARC-C (25-shot) | 54.0 | **59.0** |
| GPQA (0-shot) | 28.28 | **42.42** |
| Winogrande (0-shot) | 59.6 | **67.08** |
| TruthfulQA (0-shot) | 54.9 | **54.7** |

**Highlight:**  
+14.14 percentage point improvement on **GPQA (0-shot)**.

---

## Intended Use

TRIDENT is suitable for:
- Multi-step mathematical reasoning
- Scientific and logical inference
- Hard QA benchmarks
- Planning and hypothesis exploration
- Research on reasoning systems

---

## Limitations

- Higher inference-time compute than single-pass models
- Not optimized for low-latency chat
- Best used where reasoning depth matters more than speed

---

## Ethical Considerations

- No human-written reasoning traces used
- No preference data collection
- Training relies on verifiable task rewards
- Like all LLMs, may hallucinate outside structured reasoning workflows

---
## Paper link

https://www.shivik.in/shivik-labs/trident

## Citation

```bibtex
@article{puri2025trident,
  title={TRIDENT: Thought-based Reasoning and Improvement through Deep Exploration of Neuronal Trees},
  author={Puri, Shivansh and Khandelwal, Abhisek and Joshi, Vedant and Yadav, Akash},
  year={2025}
}