aquif-3.5-Nano-1B
aquif-3.5-Nano-1B is a lightweight yet capable language model with 1.72B parameters, delivering strong performance-to-efficiency ratios for resource-constrained deployments. Built on Qwen3-1.7B with comprehensive instruction-tuning, this model achieves competitive results across reasoning, mathematics, and code generation tasks while maintaining compatibility with consumer-grade hardware.
With a 40K token context window and bfloat16 precision, aquif-3.5-Nano-1B enables practical applications requiring fast inference and minimal memory overhead.
Model Overview
| Attribute | Value |
|---|---|
| Total Parameters | 1.72B |
| Context Window | 40K tokens |
| Hidden Size | 2048 |
| Attention Heads | 16 |
| Key-Value Heads | 8 |
| Hidden Layers | 28 |
| Activation | SiLU |
| Precision | BF16 |
| Model Type | Causal Language Model (Qwen3) |
| Multilingual | 10 languages |
| License | Apache 2.0 |
Key Features
Efficient Architecture
aquif-3.5-Nano-1B achieves remarkable performance density through:
- Optimized Layer Configuration: 28 layers with full attention mechanism balancing capacity and efficiency
- Extended Context: 40K token window enables complex reasoning and document processing without architectural constraints
- Memory Efficient: Designed for deployment on devices with 8GB+ VRAM; quantization support available for smaller footprints
- Fast Inference: Minimal parameter count enables rapid token generation and batch processing
Strong Core Capabilities
- Reasoning: Excels at multi-step logical inference and problem-solving
- Mathematics: Robust performance on mathematical reasoning and calculation tasks
- Code Generation: Proficient in generating and understanding code across multiple programming languages
- Instruction Following: Refined through comprehensive instruction-tuning for reliable task execution
Multilingual Support
Native support for 10 languages including English, German, Italian, Portuguese, French, Hindi, Spanish, Thai, Chinese, and Japanese.
Evaluation
Benchmark Performance
| Metric | aquif-3.5-Nano-1B | Ministral 3 3B Instruct | aquif-3.5-3B | Granite 4.0 H 1B | Qwen3-1.7B |
|---|---|---|---|---|---|
| MMLU | 72.9 | 70.7 | 70.2 | 59.7 | 59.1 |
| GPQA Diamond | 42.0 | 35.8 | 35.8 | 29.7 | 27.7 |
| AIME 2025 | 28.7 | 22.0 | 13.4 | 6.3 | 7.3 |
| LiveCodeBench | 24.8 | 24.7 | 23.1 | 11.5 | 12.6 |
| Average | 42.1 | 38.3 | 35.6 | 26.8 | 26.7 |
Performance Analysis
Despite being a lightweight nano model, aquif-3.5-Nano-1B demonstrates exceptional capability relative to parameter count:
- MMLU: 72.9% accuracy, significantly outperforming base Qwen3-1.7B and Granite 4.0 H 1B
- GPQA Diamond: 42.0% on expert-level questions, indicating robust reasoning capability
- AIME 2025: 28.7% on advanced mathematics, demonstrating substantial improvement over comparable models
- LiveCodeBench: 24.8% on real-world programming tasks, competitive with larger instruction-tuned variants
The model shows particular strength in reasoning and technical tasks, making it suitable for applications requiring intelligence without significant computational overhead.
Installation
pip install transformers torch
For faster inference with quantization support:
pip install transformers torch bitsandbytes
Technical Specifications
- Architecture: Qwen3 Causal Language Model
- Attention Mechanism: Full attention across all 28 layers
- Position Encoding: RoPE (Rotary Position Embeddings) with theta=1,000,000
- Normalization: RMSNorm with epsilon=1e-6
- Vocabulary Size: 151,936 tokens
- Head Dimension: 128
- Intermediate Size: 6,144
- Training Data Format: Instructions and reasoning tasks in multilingual contexts
- Attention Dropout: 0.0
- KV Caching: Enabled for efficient multi-turn inference
Use Cases
aquif-3.5-Nano-1B excels at:
- Edge Deployment: Real-time inference on resource-limited devices
- API Services: Cost-effective inference at scale with minimal latency
- Research Prototyping: Fast experimentation with instruction-following models
- Educational Applications: Learning model behavior without computational barriers
- Local Processing: Privacy-preserving on-device inference
- Embedded Systems: Integration into IoT and edge computing environments
- Mathematical Reasoning: Problem-solving and technical explanation tasks
- Code Assistance: Programming help and code generation at constrained budgets
Limitations and Considerations
- Parameter Scale: While efficient, smaller capacity compared to 3B+ models may limit performance on extremely complex tasks
- Context Length: 40K tokens supports extended reasoning but less than frontier models
- Hardware Optimization: Best performance with recent hardware supporting BF16; FP32 inference available but slower
- Specialized Domains: May require domain-specific fine-tuning for niche applications
- Real-Time Requirements: Suitable for most applications; extremely latency-critical scenarios may benefit from optimization or quantization
Performance Optimization
- Quantization: Use INT8 quantization to reduce memory footprint from ~4GB to 2-2.5GB
- Flash Attention: Compatible with flash-attention implementations for faster inference
- KV Caching: Leverages caching for efficient multi-turn conversations
- Batch Inference: Process multiple prompts simultaneously for throughput optimization
Acknowledgements
- Qwen Team: Base architecture and foundational model
- HuggingFace: Model infrastructure and community ecosystem
- aquif AI Research Team: Instruction-tuning optimization and performance refinement
License
This project is released under the Apache 2.0 License.
Made in 🇧🇷
© 2025 aquif AI. All rights reserved.
- Downloads last month
- 138
Model tree for aquif-ai/aquif-3.5-Nano-1B
Base model
Qwen/Qwen3-1.7B-Base