Qwen3-4B Dual-Skill Agent (ALFWorld & DBBench) LoRA

This repository provides a Dual-Skill LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507. It is specifically optimized for two distinct agentic tasks: Household operations (ALFWorld) and Database interactions (DBBench).

Key Improvements & Features

  • Multi-task Generalization: Balanced training on both ALFWorld and DBBench, allowing the model to switch contexts based on the system prompt.
  • Optimized Trajectories: All training data was pre-cleaned to a maximum of 3072 tokens to ensure high-density learning without truncation of critical terminal actions.
  • Assistant-Only Loss: Fine-tuned using a specialized collator that applies loss only to assistant turns (THOUGHT/ACTION), preventing the model from memorizing environment descriptions.
  • Robustness: Includes error-recovery trajectories where the agent learns to correct its path after receiving "Nothing happened" or SQL errors from the environment.
  • Cleaner Reasoning: Removed potential "tools" format traps to align strictly with the AgentBench evaluation parser.

Training Configuration

Parameter Value
Base model Qwen/Qwen3-4B-Instruct-2507
Hardware NVIDIA A100 SXM4 40GB
Precision bfloat16
Max context length 3072 tokens
Epochs 2
Learning rate 3e-06
Batch size (effective) 8
LoRA Rank / Alpha r=64 / a=128
Target Modules All Linear Layers (Q,K,V,O,Gate,Up,Down)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "your_id/your-repo-name"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Training Data Sources

  1. ALFWorld Cleaned: mark-22/alfworld_cleaned_for_agentbench_v4 - Focused on household task completion and navigation.
  2. DBBench Cleaned: mark-22/dbbench_cleaned_for_agentbench - Focused on SQL generation and database manipulation (UPDATE/SELECT).

License

This adapter is distributed under the Apache-2.0 license. Please ensure compliance with the base model's usage terms.

Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mark-22/qwen3-4b-agent-trajectory-lora_high_LR

Adapter
(5274)
this model

Datasets used to train mark-22/qwen3-4b-agent-trajectory-lora_high_LR