Spaces:
Sleeping
Sleeping
Commit ·
af8678a
1
Parent(s): edc640e
add local LLM with Ollama llama3.2:3b
Browse files- OLLAMA_SETUP.md +153 -0
- README.md +38 -5
- deep_agent_rag/config.py +7 -0
- deep_agent_rag/utils/llm_utils.py +68 -45
- pyproject.toml +1 -0
- uv.lock +28 -0
OLLAMA_SETUP.md
ADDED
|
@@ -0,0 +1,153 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Ollama 設置指南
|
| 2 |
+
|
| 3 |
+
本指南說明如何在 Deep Agentic AI Tool 中設置和使用 Ollama,特別是 Llama 3.2 3B 模型。
|
| 4 |
+
|
| 5 |
+
## 📋 前置需求
|
| 6 |
+
|
| 7 |
+
- macOS 或 Linux 系統
|
| 8 |
+
- 至少 16GB 記憶體(推薦)
|
| 9 |
+
- Python >= 3.13
|
| 10 |
+
|
| 11 |
+
## 🚀 安裝步驟
|
| 12 |
+
|
| 13 |
+
### 1. 安裝 Ollama
|
| 14 |
+
|
| 15 |
+
**macOS:**
|
| 16 |
+
```bash
|
| 17 |
+
brew install ollama
|
| 18 |
+
```
|
| 19 |
+
|
| 20 |
+
或從官網下載:https://ollama.com
|
| 21 |
+
|
| 22 |
+
**Linux:**
|
| 23 |
+
```bash
|
| 24 |
+
curl -fsSL https://ollama.com/install.sh | sh
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
### 2. 下載 Llama 3.2 模型
|
| 28 |
+
|
| 29 |
+
```bash
|
| 30 |
+
ollama pull llama3.2:3b
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
這會下載約 2GB 的模型文件。
|
| 34 |
+
|
| 35 |
+
### 3. 啟動 Ollama 服務
|
| 36 |
+
|
| 37 |
+
Ollama 通常會自動啟動,如果需要手動啟動:
|
| 38 |
+
|
| 39 |
+
```bash
|
| 40 |
+
ollama serve
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
服務預設運行在 `http://localhost:11434`
|
| 44 |
+
|
| 45 |
+
### 4. 驗證安裝
|
| 46 |
+
|
| 47 |
+
測試模型是否可用:
|
| 48 |
+
|
| 49 |
+
```bash
|
| 50 |
+
ollama run llama3.2:3b "Hello, how are you?"
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
## ⚙️ 配置專案
|
| 54 |
+
|
| 55 |
+
### 1. 更新環境變數
|
| 56 |
+
|
| 57 |
+
在專案根目錄的 `.env` 文件中添加:
|
| 58 |
+
|
| 59 |
+
```env
|
| 60 |
+
# 啟用 Ollama
|
| 61 |
+
USE_OLLAMA=true
|
| 62 |
+
OLLAMA_BASE_URL=http://localhost:11434
|
| 63 |
+
OLLAMA_MODEL=llama3.2:3b
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
### 2. 可選配置
|
| 67 |
+
|
| 68 |
+
如果需要使用其他 Ollama 模型,可以修改:
|
| 69 |
+
|
| 70 |
+
```env
|
| 71 |
+
OLLAMA_MODEL=qwen2.5:7b # 使用 Qwen2.5
|
| 72 |
+
OLLAMA_MODEL=llama3.1:8b # 使用 Llama 3.1
|
| 73 |
+
OLLAMA_MODEL=deepseek-r1:7b # 使用 DeepSeek-R1
|
| 74 |
+
OLLAMA_MODEL=mistral:7b # 使用 Mistral
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
## 🎯 使用方式
|
| 78 |
+
|
| 79 |
+
系統會按照以下優先順序自動選擇 LLM:
|
| 80 |
+
|
| 81 |
+
1. **Groq API**(如果配置了 `GROQ_API_KEY`)
|
| 82 |
+
2. **Ollama**(如果 `USE_OLLAMA=true` 且服務可用)
|
| 83 |
+
3. **MLX 模型**(備援選項)
|
| 84 |
+
|
| 85 |
+
當 Groq API 額度用完時,系統會自動切換到 Ollama(如果啟用),否則使用 MLX 模型。
|
| 86 |
+
|
| 87 |
+
## 🔍 檢查當前使用的模型
|
| 88 |
+
|
| 89 |
+
啟動應用後,查看控制台輸出:
|
| 90 |
+
|
| 91 |
+
- `✅ 使用 Groq API (優先)` - 使用 Groq API
|
| 92 |
+
- `✅ 使用 Ollama 模型 (llama3.2:3b)` - 使用 Ollama
|
| 93 |
+
- `ℹ️ 使用本地 MLX 模型` - 使用 MLX 模型
|
| 94 |
+
|
| 95 |
+
## 🐛 故障排除
|
| 96 |
+
|
| 97 |
+
### Ollama 服務無法連接
|
| 98 |
+
|
| 99 |
+
**問題:** `⚠️ Ollama 初始化失敗: Connection refused`
|
| 100 |
+
|
| 101 |
+
**解決方案:**
|
| 102 |
+
1. 確認 Ollama 服務正在運行:`ollama serve`
|
| 103 |
+
2. 檢查端口是否被占用:`lsof -i :11434`
|
| 104 |
+
3. 確認 `OLLAMA_BASE_URL` 配置正確
|
| 105 |
+
|
| 106 |
+
### 模型找不到
|
| 107 |
+
|
| 108 |
+
**問題:** `⚠️ Ollama 初始化失敗: model not found`
|
| 109 |
+
|
| 110 |
+
**解決方案:**
|
| 111 |
+
```bash
|
| 112 |
+
# 下載模型
|
| 113 |
+
ollama pull llama3.2:3b
|
| 114 |
+
|
| 115 |
+
# 列出已安裝的模型
|
| 116 |
+
ollama list
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
### 記憶體不足
|
| 120 |
+
|
| 121 |
+
**問題:** 系統運行緩慢或崩潰
|
| 122 |
+
|
| 123 |
+
**解決方案:**
|
| 124 |
+
- Llama 3.2:3B 需要約 2GB RAM
|
| 125 |
+
- 確保系統有足夠的可用記憶體(推薦至少 8GB)
|
| 126 |
+
- 這個模型已經很輕量,適合 16GB 記憶體的系統
|
| 127 |
+
|
| 128 |
+
## 📊 模型比較
|
| 129 |
+
|
| 130 |
+
| 模型 | 大小 | 記憶體需求 | 特點 |
|
| 131 |
+
|------|------|-----------|------|
|
| 132 |
+
| llama3.2:3b | ~2GB | ~4GB | 輕量高效,適合 16GB 記憶體系統,Meta 開源 |
|
| 133 |
+
| deepseek-r1:7b | ~4.7GB | ~8GB | 優秀的推理能力,適合數學、編程 |
|
| 134 |
+
| qwen2.5:7b | ~4.5GB | ~8GB | 通用能力強,中英文支援好 |
|
| 135 |
+
| llama3.1:8b | ~4.6GB | ~8GB | Meta 開源,性能穩定 |
|
| 136 |
+
| mistral:7b | ~4.1GB | ~7GB | 速度快,效率高 |
|
| 137 |
+
|
| 138 |
+
## 💡 性能優化建議
|
| 139 |
+
|
| 140 |
+
1. **優先使用 Groq API**:如果可用,Groq API 速度最快
|
| 141 |
+
2. **Ollama 作為備援**:當 Groq 不可用時,Ollama 提供良好的本地推理
|
| 142 |
+
3. **MLX 作為最後備援**:在 Apple Silicon 上,MLX 模型有硬體優化
|
| 143 |
+
|
| 144 |
+
## 📚 相關資源
|
| 145 |
+
|
| 146 |
+
- [Ollama 官方文檔](https://ollama.com/docs)
|
| 147 |
+
- [Llama 3.2 模型資訊](https://ollama.com/library/llama3.2)
|
| 148 |
+
- [LangChain Ollama 整合](https://python.langchain.com/docs/integrations/llms/ollama)
|
| 149 |
+
|
| 150 |
+
---
|
| 151 |
+
|
| 152 |
+
**注意**:首次使用時,Ollama 會下載模型文件,這可能需要一些時間,請耐心等待。
|
| 153 |
+
|
README.md
CHANGED
|
@@ -69,6 +69,11 @@ A comprehensive deep research agent system with RAG (Retrieval-Augmented Generat
|
|
| 69 |
# Optional: Groq API (for faster inference)
|
| 70 |
GROQ_API_KEY=your_groq_api_key_here
|
| 71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
# Optional: Tavily API (for web search)
|
| 73 |
TAVILY_API_KEY=your_tavily_api_key_here
|
| 74 |
|
|
@@ -204,18 +209,38 @@ The system uses a multi-agent workflow orchestrated by LangGraph:
|
|
| 204 |
|
| 205 |
### LLM Configuration
|
| 206 |
|
| 207 |
-
The system supports multiple LLM backends with automatic fallback:
|
| 208 |
|
| 209 |
-
1. **Primary**: Groq API (
|
| 210 |
- Model: `llama-3.3-70b-versatile`
|
| 211 |
- Automatically used if `GROQ_API_KEY` is set
|
| 212 |
|
| 213 |
-
2. **
|
| 214 |
-
- Model: `
|
|
|
|
|
|
|
|
|
|
| 215 |
- Automatically used when Groq API is unavailable or quota exhausted
|
| 216 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 217 |
The system automatically switches between backends based on availability.
|
| 218 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 219 |
## ⚙️ Configuration
|
| 220 |
|
| 221 |
Key configuration options in `deep_agent_rag/config.py`:
|
|
@@ -283,6 +308,7 @@ Key dependencies (see `pyproject.toml` for complete list):
|
|
| 283 |
- **LangChain**: Agent framework and tool integration
|
| 284 |
- **LangGraph**: Agent orchestration and workflow management
|
| 285 |
- **MLX/MLX-LM**: Local model inference (Apple Silicon optimized)
|
|
|
|
| 286 |
- **Gradio**: Web interface
|
| 287 |
- **ChromaDB**: Vector database for RAG
|
| 288 |
- **Tavily**: Web search API
|
|
@@ -298,9 +324,16 @@ Key dependencies (see `pyproject.toml` for complete list):
|
|
| 298 |
|
| 299 |
### Groq API Issues
|
| 300 |
|
| 301 |
-
- **Quota exhausted**: The system automatically falls back to local MLX model
|
| 302 |
- **API errors**: Check your `GROQ_API_KEY` in `.env` file
|
| 303 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 304 |
### RAG System Issues
|
| 305 |
|
| 306 |
- **PDF not found**: Ensure the PDF file exists at the path specified in `config.py`
|
|
|
|
| 69 |
# Optional: Groq API (for faster inference)
|
| 70 |
GROQ_API_KEY=your_groq_api_key_here
|
| 71 |
|
| 72 |
+
# Optional: Ollama (for local inference with Llama 3.2 or other models)
|
| 73 |
+
USE_OLLAMA=true
|
| 74 |
+
OLLAMA_BASE_URL=http://localhost:11434
|
| 75 |
+
OLLAMA_MODEL=llama3.2:3b
|
| 76 |
+
|
| 77 |
# Optional: Tavily API (for web search)
|
| 78 |
TAVILY_API_KEY=your_tavily_api_key_here
|
| 79 |
|
|
|
|
| 209 |
|
| 210 |
### LLM Configuration
|
| 211 |
|
| 212 |
+
The system supports multiple LLM backends with automatic fallback (priority order):
|
| 213 |
|
| 214 |
+
1. **Primary**: Groq API (fastest, requires API key)
|
| 215 |
- Model: `llama-3.3-70b-versatile`
|
| 216 |
- Automatically used if `GROQ_API_KEY` is set
|
| 217 |
|
| 218 |
+
2. **Secondary**: Ollama (local inference, excellent reasoning capabilities)
|
| 219 |
+
- Default Model: `llama3.2:3b` (Llama 3.2 3B)
|
| 220 |
+
- Requires Ollama installed and model downloaded
|
| 221 |
+
- Enable with `USE_OLLAMA=true` in `.env`
|
| 222 |
+
- Lightweight and efficient, suitable for 16GB memory systems
|
| 223 |
- Automatically used when Groq API is unavailable or quota exhausted
|
| 224 |
|
| 225 |
+
3. **Fallback**: Local MLX Model (privacy-preserving, no API key needed)
|
| 226 |
+
- Model: `mlx-community/Qwen2.5-Coder-7B-Instruct-4bit`
|
| 227 |
+
- Automatically used when both Groq API and Ollama are unavailable
|
| 228 |
+
|
| 229 |
The system automatically switches between backends based on availability.
|
| 230 |
|
| 231 |
+
**Setting up Ollama:**
|
| 232 |
+
```bash
|
| 233 |
+
# Install Ollama (if not already installed)
|
| 234 |
+
# macOS: brew install ollama
|
| 235 |
+
# Or download from https://ollama.com
|
| 236 |
+
|
| 237 |
+
# Download Llama 3.2 model
|
| 238 |
+
ollama pull llama3.2:3b
|
| 239 |
+
|
| 240 |
+
# Start Ollama service (usually runs automatically)
|
| 241 |
+
ollama serve
|
| 242 |
+
```
|
| 243 |
+
|
| 244 |
## ⚙️ Configuration
|
| 245 |
|
| 246 |
Key configuration options in `deep_agent_rag/config.py`:
|
|
|
|
| 308 |
- **LangChain**: Agent framework and tool integration
|
| 309 |
- **LangGraph**: Agent orchestration and workflow management
|
| 310 |
- **MLX/MLX-LM**: Local model inference (Apple Silicon optimized)
|
| 311 |
+
- **LangChain Ollama**: Ollama integration for local models
|
| 312 |
- **Gradio**: Web interface
|
| 313 |
- **ChromaDB**: Vector database for RAG
|
| 314 |
- **Tavily**: Web search API
|
|
|
|
| 324 |
|
| 325 |
### Groq API Issues
|
| 326 |
|
| 327 |
+
- **Quota exhausted**: The system automatically falls back to Ollama (if enabled) or local MLX model
|
| 328 |
- **API errors**: Check your `GROQ_API_KEY` in `.env` file
|
| 329 |
|
| 330 |
+
### Ollama Issues
|
| 331 |
+
|
| 332 |
+
- **Ollama not starting**: Ensure Ollama service is running (`ollama serve`)
|
| 333 |
+
- **Model not found**: Download the model first (`ollama pull llama3.2:3b`)
|
| 334 |
+
- **Connection errors**: Check `OLLAMA_BASE_URL` in `.env` (default: `http://localhost:11434`)
|
| 335 |
+
- **Memory issues**: Llama 3.2:3B requires ~2GB RAM, suitable for systems with 16GB memory
|
| 336 |
+
|
| 337 |
### RAG System Issues
|
| 338 |
|
| 339 |
- **PDF not found**: Ensure the PDF file exists at the path specified in `config.py`
|
deep_agent_rag/config.py
CHANGED
|
@@ -49,6 +49,13 @@ GROQ_MAX_TOKENS = 2048
|
|
| 49 |
GROQ_TEMPERATURE = 0.7
|
| 50 |
USE_GROQ_FIRST = True # 是否优先使用 Groq API
|
| 51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
# Email 配置 - 使用 Gmail API
|
| 53 |
EMAIL_SENDER = "matthuang46@gmail.com"
|
| 54 |
# Gmail API 配置
|
|
|
|
| 49 |
GROQ_TEMPERATURE = 0.7
|
| 50 |
USE_GROQ_FIRST = True # 是否优先使用 Groq API
|
| 51 |
|
| 52 |
+
# Ollama 配置
|
| 53 |
+
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
|
| 54 |
+
OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "llama3.2:3b") # Llama 3.2 3B
|
| 55 |
+
OLLAMA_MAX_TOKENS = 2048
|
| 56 |
+
OLLAMA_TEMPERATURE = 0.7
|
| 57 |
+
USE_OLLAMA = os.getenv("USE_OLLAMA", "false").lower() == "true" # 是否啟用 Ollama
|
| 58 |
+
|
| 59 |
# Email 配置 - 使用 Gmail API
|
| 60 |
EMAIL_SENDER = "matthuang46@gmail.com"
|
| 61 |
# Gmail API 配置
|
deep_agent_rag/utils/llm_utils.py
CHANGED
|
@@ -1,11 +1,12 @@
|
|
| 1 |
"""
|
| 2 |
LLM 工具函數
|
| 3 |
提供 LLM 實例的創建和管理
|
| 4 |
-
優先
|
| 5 |
"""
|
| 6 |
import warnings
|
| 7 |
from typing import Optional
|
| 8 |
from langchain_groq import ChatGroq
|
|
|
|
| 9 |
from ..models import MLXChatModel, load_mlx_model
|
| 10 |
from ..config import (
|
| 11 |
MLX_MAX_TOKENS,
|
|
@@ -14,7 +15,12 @@ from ..config import (
|
|
| 14 |
GROQ_MODEL,
|
| 15 |
GROQ_MAX_TOKENS,
|
| 16 |
GROQ_TEMPERATURE,
|
| 17 |
-
USE_GROQ_FIRST
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
)
|
| 19 |
|
| 20 |
# 全局變量:跟踪當前使用的 LLM 類型
|
|
@@ -29,30 +35,17 @@ def get_llm_type() -> str:
|
|
| 29 |
|
| 30 |
def is_using_local_llm() -> bool:
|
| 31 |
"""檢查是否正在使用本地 LLM"""
|
| 32 |
-
return _current_llm_type
|
| 33 |
|
| 34 |
|
| 35 |
def get_llm():
|
| 36 |
"""
|
| 37 |
獲取 LLM 實例
|
| 38 |
-
優先
|
| 39 |
"""
|
| 40 |
global _current_llm_type, _groq_quota_exceeded
|
| 41 |
|
| 42 |
-
#
|
| 43 |
-
if _groq_quota_exceeded:
|
| 44 |
-
if _current_llm_type != "mlx":
|
| 45 |
-
print("⚠️ 警告:Groq API 額度已用完,已切換到本地 MLX 模型 (Qwen2.5)")
|
| 46 |
-
_current_llm_type = "mlx"
|
| 47 |
-
model, tokenizer = load_mlx_model()
|
| 48 |
-
return MLXChatModel(
|
| 49 |
-
model=model,
|
| 50 |
-
tokenizer=tokenizer,
|
| 51 |
-
max_tokens=MLX_MAX_TOKENS,
|
| 52 |
-
temperature=MLX_TEMPERATURE
|
| 53 |
-
)
|
| 54 |
-
|
| 55 |
-
# 嘗試使用 Groq API
|
| 56 |
if USE_GROQ_FIRST and GROQ_API_KEY:
|
| 57 |
try:
|
| 58 |
groq_llm = ChatGroq(
|
|
@@ -61,48 +54,61 @@ def get_llm():
|
|
| 61 |
max_tokens=GROQ_MAX_TOKENS,
|
| 62 |
temperature=GROQ_TEMPERATURE
|
| 63 |
)
|
| 64 |
-
# 測試連接(通過一個簡單的調用來驗證)
|
| 65 |
-
# 注意:這裡不實際調用,只是創建實例
|
| 66 |
_current_llm_type = "groq"
|
| 67 |
print("✅ 使用 Groq API (優先)")
|
| 68 |
return groq_llm
|
| 69 |
except Exception as e:
|
| 70 |
-
# 如果創建失敗,
|
| 71 |
print(f"⚠️ Groq API 初始化失敗: {e}")
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
|
|
|
| 81 |
)
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
|
| 96 |
def handle_groq_error(error: Exception) -> Optional[MLXChatModel]:
|
| 97 |
"""
|
| 98 |
處理 Groq API 錯誤
|
| 99 |
-
如果是額度用完錯誤,切換到
|
| 100 |
|
| 101 |
Args:
|
| 102 |
error: 捕獲的異常
|
| 103 |
|
| 104 |
Returns:
|
| 105 |
-
如果切換到本地模型,返回 MLXChatModel;否則返回 None
|
| 106 |
"""
|
| 107 |
global _current_llm_type, _groq_quota_exceeded
|
| 108 |
|
|
@@ -121,10 +127,27 @@ def handle_groq_error(error: Exception) -> Optional[MLXChatModel]:
|
|
| 121 |
if any(indicator in error_str for indicator in quota_indicators):
|
| 122 |
if not _groq_quota_exceeded:
|
| 123 |
_groq_quota_exceeded = True
|
| 124 |
-
warning_msg = "⚠️ 警告:Groq API 額度已用完
|
| 125 |
print(warning_msg)
|
| 126 |
warnings.warn(warning_msg, UserWarning)
|
| 127 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
_current_llm_type = "mlx"
|
| 129 |
model, tokenizer = load_mlx_model()
|
| 130 |
return MLXChatModel(
|
|
|
|
| 1 |
"""
|
| 2 |
LLM 工具函數
|
| 3 |
提供 LLM 實例的創建和管理
|
| 4 |
+
優先順序:Groq API > Ollama > MLX 模型
|
| 5 |
"""
|
| 6 |
import warnings
|
| 7 |
from typing import Optional
|
| 8 |
from langchain_groq import ChatGroq
|
| 9 |
+
from langchain_ollama import ChatOllama
|
| 10 |
from ..models import MLXChatModel, load_mlx_model
|
| 11 |
from ..config import (
|
| 12 |
MLX_MAX_TOKENS,
|
|
|
|
| 15 |
GROQ_MODEL,
|
| 16 |
GROQ_MAX_TOKENS,
|
| 17 |
GROQ_TEMPERATURE,
|
| 18 |
+
USE_GROQ_FIRST,
|
| 19 |
+
OLLAMA_BASE_URL,
|
| 20 |
+
OLLAMA_MODEL,
|
| 21 |
+
OLLAMA_MAX_TOKENS,
|
| 22 |
+
OLLAMA_TEMPERATURE,
|
| 23 |
+
USE_OLLAMA,
|
| 24 |
)
|
| 25 |
|
| 26 |
# 全局變量:跟踪當前使用的 LLM 類型
|
|
|
|
| 35 |
|
| 36 |
def is_using_local_llm() -> bool:
|
| 37 |
"""檢查是否正在使用本地 LLM"""
|
| 38 |
+
return _current_llm_type in ["mlx", "ollama"] or _groq_quota_exceeded
|
| 39 |
|
| 40 |
|
| 41 |
def get_llm():
|
| 42 |
"""
|
| 43 |
獲取 LLM 實例
|
| 44 |
+
優先順序:Groq API > Ollama > MLX 模型
|
| 45 |
"""
|
| 46 |
global _current_llm_type, _groq_quota_exceeded
|
| 47 |
|
| 48 |
+
# 優先順序 1: Groq API
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
if USE_GROQ_FIRST and GROQ_API_KEY:
|
| 50 |
try:
|
| 51 |
groq_llm = ChatGroq(
|
|
|
|
| 54 |
max_tokens=GROQ_MAX_TOKENS,
|
| 55 |
temperature=GROQ_TEMPERATURE
|
| 56 |
)
|
|
|
|
|
|
|
| 57 |
_current_llm_type = "groq"
|
| 58 |
print("✅ 使用 Groq API (優先)")
|
| 59 |
return groq_llm
|
| 60 |
except Exception as e:
|
| 61 |
+
# 如果創建失敗,繼續嘗試其他選項
|
| 62 |
print(f"⚠️ Groq API 初始化失敗: {e}")
|
| 63 |
+
# 不立即設置 _groq_quota_exceeded,先嘗試 Ollama
|
| 64 |
+
|
| 65 |
+
# 優先順序 2: Ollama (Llama 3.2 或其他模型)
|
| 66 |
+
if USE_OLLAMA:
|
| 67 |
+
try:
|
| 68 |
+
ollama_llm = ChatOllama(
|
| 69 |
+
base_url=OLLAMA_BASE_URL,
|
| 70 |
+
model=OLLAMA_MODEL,
|
| 71 |
+
num_predict=OLLAMA_MAX_TOKENS,
|
| 72 |
+
temperature=OLLAMA_TEMPERATURE,
|
| 73 |
)
|
| 74 |
+
_current_llm_type = "ollama"
|
| 75 |
+
print(f"✅ 使用 Ollama 模型 ({OLLAMA_MODEL})")
|
| 76 |
+
return ollama_llm
|
| 77 |
+
except Exception as e:
|
| 78 |
+
print(f"⚠️ Ollama 初始化失敗: {e}")
|
| 79 |
+
print(" 請確保 Ollama 服務正在運行: ollama serve")
|
| 80 |
+
print(" 或檢查模型是否已下載: ollama pull " + OLLAMA_MODEL)
|
| 81 |
+
|
| 82 |
+
# 優先順序 3: MLX 模型(備援)
|
| 83 |
+
# 如果 Groq 額度用完,記錄狀態
|
| 84 |
+
if _groq_quota_exceeded and _current_llm_type != "mlx":
|
| 85 |
+
print("⚠️ 警告:Groq API 額度已用完,已切換到本地 MLX 模型 (Qwen2.5)")
|
| 86 |
+
elif _current_llm_type != "mlx":
|
| 87 |
+
if not GROQ_API_KEY and not USE_OLLAMA:
|
| 88 |
+
print("ℹ️ 未配置 Groq API 或 Ollama,使用本地 MLX 模型")
|
| 89 |
+
elif not USE_OLLAMA:
|
| 90 |
+
print("ℹ️ Ollama 未啟用,使用本地 MLX 模型作為備援")
|
| 91 |
+
|
| 92 |
+
_current_llm_type = "mlx"
|
| 93 |
+
model, tokenizer = load_mlx_model()
|
| 94 |
+
return MLXChatModel(
|
| 95 |
+
model=model,
|
| 96 |
+
tokenizer=tokenizer,
|
| 97 |
+
max_tokens=MLX_MAX_TOKENS,
|
| 98 |
+
temperature=MLX_TEMPERATURE
|
| 99 |
+
)
|
| 100 |
|
| 101 |
|
| 102 |
def handle_groq_error(error: Exception) -> Optional[MLXChatModel]:
|
| 103 |
"""
|
| 104 |
處理 Groq API 錯誤
|
| 105 |
+
如果是額度用完錯誤,先嘗試切換到 Ollama,否則切換到 MLX 模型
|
| 106 |
|
| 107 |
Args:
|
| 108 |
error: 捕獲的異常
|
| 109 |
|
| 110 |
Returns:
|
| 111 |
+
如果切換到本地模型,返回 ChatOllama 或 MLXChatModel;否則返回 None
|
| 112 |
"""
|
| 113 |
global _current_llm_type, _groq_quota_exceeded
|
| 114 |
|
|
|
|
| 127 |
if any(indicator in error_str for indicator in quota_indicators):
|
| 128 |
if not _groq_quota_exceeded:
|
| 129 |
_groq_quota_exceeded = True
|
| 130 |
+
warning_msg = "⚠️ 警告:Groq API 額度已用完"
|
| 131 |
print(warning_msg)
|
| 132 |
warnings.warn(warning_msg, UserWarning)
|
| 133 |
|
| 134 |
+
# 先嘗試使用 Ollama
|
| 135 |
+
if USE_OLLAMA:
|
| 136 |
+
try:
|
| 137 |
+
ollama_llm = ChatOllama(
|
| 138 |
+
base_url=OLLAMA_BASE_URL,
|
| 139 |
+
model=OLLAMA_MODEL,
|
| 140 |
+
num_predict=OLLAMA_MAX_TOKENS,
|
| 141 |
+
temperature=OLLAMA_TEMPERATURE,
|
| 142 |
+
)
|
| 143 |
+
_current_llm_type = "ollama"
|
| 144 |
+
print(f"✅ 已切換到 Ollama 模型 ({OLLAMA_MODEL})")
|
| 145 |
+
return ollama_llm
|
| 146 |
+
except Exception as e:
|
| 147 |
+
print(f"⚠️ Ollama 切換失敗: {e}")
|
| 148 |
+
print(" 回退到 MLX 模型")
|
| 149 |
+
|
| 150 |
+
# 回退到 MLX 模型
|
| 151 |
_current_llm_type = "mlx"
|
| 152 |
model, tokenizer = load_mlx_model()
|
| 153 |
return MLXChatModel(
|
pyproject.toml
CHANGED
|
@@ -17,6 +17,7 @@ dependencies = [
|
|
| 17 |
"yfinance>=0.2.66",
|
| 18 |
"langgraph>=1.0.4",
|
| 19 |
"langchain-groq>=1.1.0",
|
|
|
|
| 20 |
"grandalf>=0.8",
|
| 21 |
"langserve[all]>=0.3.3",
|
| 22 |
"fastapi>=0.124.2",
|
|
|
|
| 17 |
"yfinance>=0.2.66",
|
| 18 |
"langgraph>=1.0.4",
|
| 19 |
"langchain-groq>=1.1.0",
|
| 20 |
+
"langchain-ollama>=0.1.0",
|
| 21 |
"grandalf>=0.8",
|
| 22 |
"langserve[all]>=0.3.3",
|
| 23 |
"fastapi>=0.124.2",
|
uv.lock
CHANGED
|
@@ -636,6 +636,7 @@ dependencies = [
|
|
| 636 |
{ name = "langchain-community" },
|
| 637 |
{ name = "langchain-google-genai" },
|
| 638 |
{ name = "langchain-groq" },
|
|
|
|
| 639 |
{ name = "langchain-tavily" },
|
| 640 |
{ name = "langgraph" },
|
| 641 |
{ name = "langserve", extra = ["all"] },
|
|
@@ -671,6 +672,7 @@ requires-dist = [
|
|
| 671 |
{ name = "langchain-community", specifier = ">=0.4.1" },
|
| 672 |
{ name = "langchain-google-genai", specifier = ">=4.0.0" },
|
| 673 |
{ name = "langchain-groq", specifier = ">=1.1.0" },
|
|
|
|
| 674 |
{ name = "langchain-tavily", specifier = ">=0.2.13" },
|
| 675 |
{ name = "langgraph", specifier = ">=1.0.4" },
|
| 676 |
{ name = "langserve", extras = ["all"], specifier = ">=0.3.3" },
|
|
@@ -1591,6 +1593,19 @@ wheels = [
|
|
| 1591 |
{ url = "https://files.pythonhosted.org/packages/af/4a/3d6227a16fe9f79968414b50e50869519378b20653805e2e8fab283908e6/langchain_groq-1.1.1-py3-none-any.whl", hash = "sha256:1c6d5146f60205dcde09d7e47bb5291c295d3f0c7bcd2417e4d3a73a04bd1050", size = 19039, upload-time = "2025-12-12T22:00:45.86Z" },
|
| 1592 |
]
|
| 1593 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1594 |
[[package]]
|
| 1595 |
name = "langchain-tavily"
|
| 1596 |
version = "0.2.16"
|
|
@@ -2243,6 +2258,19 @@ wheels = [
|
|
| 2243 |
{ url = "https://files.pythonhosted.org/packages/be/9c/92789c596b8df838baa98fa71844d84283302f7604ed565dafe5a6b5041a/oauthlib-3.3.1-py3-none-any.whl", hash = "sha256:88119c938d2b8fb88561af5f6ee0eec8cc8d552b7bb1f712743136eb7523b7a1", size = 160065, upload-time = "2025-06-19T22:48:06.508Z" },
|
| 2244 |
]
|
| 2245 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2246 |
[[package]]
|
| 2247 |
name = "onnxruntime"
|
| 2248 |
version = "1.23.2"
|
|
|
|
| 636 |
{ name = "langchain-community" },
|
| 637 |
{ name = "langchain-google-genai" },
|
| 638 |
{ name = "langchain-groq" },
|
| 639 |
+
{ name = "langchain-ollama" },
|
| 640 |
{ name = "langchain-tavily" },
|
| 641 |
{ name = "langgraph" },
|
| 642 |
{ name = "langserve", extra = ["all"] },
|
|
|
|
| 672 |
{ name = "langchain-community", specifier = ">=0.4.1" },
|
| 673 |
{ name = "langchain-google-genai", specifier = ">=4.0.0" },
|
| 674 |
{ name = "langchain-groq", specifier = ">=1.1.0" },
|
| 675 |
+
{ name = "langchain-ollama", specifier = ">=0.1.0" },
|
| 676 |
{ name = "langchain-tavily", specifier = ">=0.2.13" },
|
| 677 |
{ name = "langgraph", specifier = ">=1.0.4" },
|
| 678 |
{ name = "langserve", extras = ["all"], specifier = ">=0.3.3" },
|
|
|
|
| 1593 |
{ url = "https://files.pythonhosted.org/packages/af/4a/3d6227a16fe9f79968414b50e50869519378b20653805e2e8fab283908e6/langchain_groq-1.1.1-py3-none-any.whl", hash = "sha256:1c6d5146f60205dcde09d7e47bb5291c295d3f0c7bcd2417e4d3a73a04bd1050", size = 19039, upload-time = "2025-12-12T22:00:45.86Z" },
|
| 1594 |
]
|
| 1595 |
|
| 1596 |
+
[[package]]
|
| 1597 |
+
name = "langchain-ollama"
|
| 1598 |
+
version = "1.0.1"
|
| 1599 |
+
source = { registry = "https://pypi.org/simple" }
|
| 1600 |
+
dependencies = [
|
| 1601 |
+
{ name = "langchain-core" },
|
| 1602 |
+
{ name = "ollama" },
|
| 1603 |
+
]
|
| 1604 |
+
sdist = { url = "https://files.pythonhosted.org/packages/73/51/72cd04d74278f3575f921084f34280e2f837211dc008c9671c268c578afe/langchain_ollama-1.0.1.tar.gz", hash = "sha256:e37880c2f41cdb0895e863b1cfd0c2c840a117868b3f32e44fef42569e367443", size = 153850, upload-time = "2025-12-12T21:48:28.68Z" }
|
| 1605 |
+
wheels = [
|
| 1606 |
+
{ url = "https://files.pythonhosted.org/packages/e3/46/f2907da16dc5a5a6c679f83b7de21176178afad8d2ca635a581429580ef6/langchain_ollama-1.0.1-py3-none-any.whl", hash = "sha256:37eb939a4718a0255fe31e19fbb0def044746c717b01b97d397606ebc3e9b440", size = 29207, upload-time = "2025-12-12T21:48:27.832Z" },
|
| 1607 |
+
]
|
| 1608 |
+
|
| 1609 |
[[package]]
|
| 1610 |
name = "langchain-tavily"
|
| 1611 |
version = "0.2.16"
|
|
|
|
| 2258 |
{ url = "https://files.pythonhosted.org/packages/be/9c/92789c596b8df838baa98fa71844d84283302f7604ed565dafe5a6b5041a/oauthlib-3.3.1-py3-none-any.whl", hash = "sha256:88119c938d2b8fb88561af5f6ee0eec8cc8d552b7bb1f712743136eb7523b7a1", size = 160065, upload-time = "2025-06-19T22:48:06.508Z" },
|
| 2259 |
]
|
| 2260 |
|
| 2261 |
+
[[package]]
|
| 2262 |
+
name = "ollama"
|
| 2263 |
+
version = "0.6.1"
|
| 2264 |
+
source = { registry = "https://pypi.org/simple" }
|
| 2265 |
+
dependencies = [
|
| 2266 |
+
{ name = "httpx" },
|
| 2267 |
+
{ name = "pydantic" },
|
| 2268 |
+
]
|
| 2269 |
+
sdist = { url = "https://files.pythonhosted.org/packages/9d/5a/652dac4b7affc2b37b95386f8ae78f22808af09d720689e3d7a86b6ed98e/ollama-0.6.1.tar.gz", hash = "sha256:478c67546836430034b415ed64fa890fd3d1ff91781a9d548b3325274e69d7c6", size = 51620, upload-time = "2025-11-13T23:02:17.416Z" }
|
| 2270 |
+
wheels = [
|
| 2271 |
+
{ url = "https://files.pythonhosted.org/packages/47/4f/4a617ee93d8208d2bcf26b2d8b9402ceaed03e3853c754940e2290fed063/ollama-0.6.1-py3-none-any.whl", hash = "sha256:fc4c984b345735c5486faeee67d8a265214a31cbb828167782dc642ce0a2bf8c", size = 14354, upload-time = "2025-11-13T23:02:16.292Z" },
|
| 2272 |
+
]
|
| 2273 |
+
|
| 2274 |
[[package]]
|
| 2275 |
name = "onnxruntime"
|
| 2276 |
version = "1.23.2"
|