Capybara-31B

Beta / WIP. This is an experimental release made to validate the fine-tune process and test behavior on real hardware. It is not a production-ready model. Expect rough edges, and treat evaluation results as preliminary.

TendieLabs/Capybara-31B is a fine-tuned version of google/gemma-4-31B-it, trained to be a better local orchestrator and assistant. The primary goal was not to maximize raw code generation but to produce a model that reasons well, communicates clearly, knows when to delegate, and stays honest under pressure.

GGUF variants: TendieLabs/Capybara-31B-GGUFS


Model Description

Capybara-31B is built for the front-desk orchestrator role in a multi-agent setup. It handles ordinary requests, summarizes messy context, routes and decomposes tasks, and delegates complex implementation to specialist agents. The personality target was Claude Sonnet, prioritizing directness, structure, and honesty over verbose performance.

This is not a coding model. It is an assistant model with sharpened coding judgment. The distinction matters: it should analyze and review code well, but it should route heavy implementation work instead of attempting it alone.

Property Value
Base model google/gemma-4-31B-it
Model family Gemma 4 (dense)
Fine-tune method QLoRA (LoRA over 4-bit base)
Context window 2048 tokens (first run, conservative)
Primary role Local orchestrator / front-desk assistant

Intended Use

Good fits:

  • Answering general assistant requests clearly and concisely
  • Summarizing messy notes, project context, or requirements
  • Decomposing tasks and routing work to appropriate specialists
  • Code review, debugging analysis, and implementation advice
  • Handling ambiguity by asking one focused clarifying question instead of guessing

Not intended for:

  • Autonomous multi-file repo editing
  • Large-scale code generation without a specialist downstream
  • Replacing a dedicated coding model for implementation-heavy tasks

Training Details

Dataset Mix

The training mix was weighted toward assistant behavior, routing, and summarization rather than code generation.

Source Role Weight
Crownelius/Opus-4.6-Reasoning-3300x Reasoning quality, structure, helpfulness 18%
Crownelius/High-Coder-Reasoning-Multi-Turn Debugging judgment, code analysis, multi-turn 18%
microsoft/rStar-Coder Harder reasoning and coding tasks 15%
Custom routing / delegation set Front-desk routing behavior 15%
NickyNicky/Code-290k (filtered) Code competence floor 10%
Crownelius/Opus-4.5-Writing-Style-formatted Tone and personality shaping 10%
Custom summarization / context digestion set Project-note compression, task extraction 10%
Crownelius/GLM-5.0-25000x (filtered) General reasoning filler 4%

Total training rows: 10K to 20K high-signal examples.

Training Configuration

Hyperparameter Value
Method QLoRA
LoRA rank 16-32
LoRA alpha 32-64
Dropout 0.0-0.05
Learning rate 1e-5 to 2e-5
LR scheduler Cosine
Warmup ratio 2-3%
Epochs 1
Sequence length 2048
Batch size 1 (gradient accumulation 4)
Gradient checkpointing Unsloth

Hardware Requirements

Capybara-31B was developed and validated on an RTX 3090 (24 GB VRAM). At IQ4_XS quantization the model leaves approximately 3 GB of VRAM free on that card, making it a practical local-first deployment for a single consumer GPU.

Quant Approx VRAM Recommended for
IQ4_XS ~21 GB RTX 3090, 4090, single-GPU setups
Q4_K_M ~22 GB RTX 3090, 4090
Q5_K_M ~24 GB 24 GB cards (tight)
Q8_0 ~34 GB Dual-GPU or large VRAM server
F16 ~62 GB Server-grade hardware

GGUF Variants

Available at TendieLabs/Capybara-31B-GGUFS:

  • IQ4_XS (recommended starting point)
  • IQ4_NL
  • Q4_0, Q4_1, Q4_K_S, Q4_K_M
  • Q5_0, Q5_1, Q5_K_S, Q5_K_M
  • Q6_K
  • Q8_0, Q8_1
  • F16

Evaluation

The model was evaluated across the following dimensions before release:

  • Delegation accuracy: does it route implementation-heavy work correctly instead of attempting it?
  • Honesty under uncertainty: does it admit when context is missing rather than hallucinating answers?
  • Long-context summarization: does it compress messy project notes into useful summaries?
  • Code review quality: does it identify real issues, risks, and next steps?
  • Tone: does the output feel like a capable, direct assistant rather than a verbose language model?

The key failure mode being screened against: improved tone alongside degraded judgment and worse delegation behavior.


Limitations

  • First-run adapter. Behavior targets are correct but some edge cases may need refinement in future versions.
  • Sequence length was kept conservative (2048). Long-document tasks may need to be chunked.
  • Gemma 4 tooling was relatively new at training time. Some export or serving quirks may apply depending on your inference stack.
  • Not designed for multimodal tasks despite the Gemma 4 family's vision capabilities. Text-only fine-tune.

License

This model is derived from google/gemma-4-31B-it and is released under the Gemma Terms of Use. Usage is subject to those terms.


Downloads last month
117
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TendieLabs/Capybara-31B

Adapter
(9)
this model

Datasets used to train TendieLabs/Capybara-31B