Qwen3-4B Dual-Skill Agent (ALFWorld & DBBench) LoRA
This repository provides a Dual-Skill LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507. It is specifically optimized for two distinct agentic tasks: Household operations (ALFWorld) and Database interactions (DBBench).
Key Improvements & Features
- Multi-task Generalization: Balanced training on both ALFWorld and DBBench, allowing the model to switch contexts based on the system prompt.
- Optimized Trajectories: All training data was pre-cleaned to a maximum of 3072 tokens to ensure high-density learning without truncation of critical terminal actions.
- Assistant-Only Loss: Fine-tuned using a specialized collator that applies loss only to assistant turns (THOUGHT/ACTION), preventing the model from memorizing environment descriptions.
- Robustness: Includes error-recovery trajectories where the agent learns to correct its path after receiving "Nothing happened" or SQL errors from the environment.
- Cleaner Reasoning: Removed potential "tools" format traps to align strictly with the AgentBench evaluation parser.
Training Configuration
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| Hardware | NVIDIA A100 SXM4 40GB |
| Precision | bfloat16 |
| Max context length | 3072 tokens |
| Epochs | 2 |
| Learning rate | 3e-06 |
| Batch size (effective) | 8 |
| LoRA Rank / Alpha | r=64 / a=128 |
| Target Modules | All Linear Layers (Q,K,V,O,Gate,Up,Down) |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "your_id/your-repo-name"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
Training Data Sources
- ALFWorld Cleaned:
mark-22/alfworld_cleaned_for_agentbench_v4- Focused on household task completion and navigation. - DBBench Cleaned:
mark-22/dbbench_cleaned_for_agentbench- Focused on SQL generation and database manipulation (UPDATE/SELECT).
License
This adapter is distributed under the Apache-2.0 license. Please ensure compliance with the base model's usage terms.
- Downloads last month
- -
Model tree for mark-22/qwen3-4b-agent-trajectory-lora_high_LR
Base model
Qwen/Qwen3-4B-Instruct-2507