win10/Llama-3.3-WeDLM-12B-Base-Up-Scaling

本儲存庫提供一個「合併後」的基礎語言模型（base model）權重與對應的 Transformers 自訂模型程式碼（custom_code）。合併規格與統計資訊已隨檔案一併提供（merge_plan.json、merge_stats.json）。

來源模型（Upstream Models）

tencent/WeDLM-8B-Base（作為 main / backbone 的來源之一）
shb777/Llama-3.3-8B-Instruct-128K（作為 sub / donor 的來源之一；其上游註記為 allura-forge/Llama-3.3-8B-Instruct）

注意：本模型為「Base」取向（未保證具備指令對齊/安全對齊行為）。如需對話/指令能力，建議自行再進行 SFT / DPO / RLHF 等微調流程。

模型重點（Config 摘要）

以本倉庫 config.json 為準：

Architecture: WeDLMForCausalLM（model_type: wedlm，需 trust_remote_code=True）
num_hidden_layers: 52
hidden_size: 4096
intermediate_size: 14336
num_attention_heads: 32
num_key_value_heads: 8
head_dim: 128
max_position_embeddings: 16384
rope_theta: 1,000,000
vocab_size: 151,936
dtype: bfloat16
transformers_version: 4.57.1（建議使用相同或更新版本）

合併方法（Merge Plan）

合併規格記錄於 merge_plan.json：

merge_strategy: dus
output_layers: 52
output_vocab: 151,936（沿用 main 的 vocab）
Layer 拼接（以輸出層 out_layer 計）：
- out_layer 0–27：來自 main（對應 src_layer 0–27）
- out_layer 28–51：來自 sub（對應 src_layer 8–31）

合併統計（merge_stats.json）：

shape_expansions: 84
verification_issues: 0
global_verification_issues: 0

倉庫內容（Files）

權重：model-0000x-of-00005.safetensors + model.safetensors.index.json
Transformers 自訂程式碼：
- configuration_wedlm.py
- modeling_wedlm.py
Tokenizer / Chat template：
- tokenizer.json、tokenizer_config.json、special_tokens_map.json
- chat_template.jinja
合併資訊：
- merge_plan.json
- merge_stats.json

使用方式

1) Transformers（建議用於訓練/forward pass/簡單推理）

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "win10/Llama-3.3-WeDLM-12B-Base-Up-Scaling"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = "The theory of relativity states that"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.7,
        top_p=0.95,
    )

print(tokenizer.decode(out[0], skip_special_tokens=True))

2) WeDLM Engine（若你偏好 WeDLM 的推理路徑）

上游 WeDLM 提供 wedlm engine；若你的環境已能使用該推理引擎，可將 model_id 指向本倉庫模型以測試可用性（具體支援程度依你使用的 engine 版本而定）。

pip install git+https://github.com/tencent/WeDLM.git

from wedlm import LLM, SamplingParams

model_id = "win10/Llama-3.3-WeDLM-12B-Base-Up-Scaling"
llm = LLM(model=model_id)

prompt = "The theory of relativity states that"
outputs = llm.generate([prompt], SamplingParams(max_tokens=256))
print(outputs[0]["text"])

授權（License）

本倉庫屬於合併/衍生模型（derivative work），請同時遵守所有上游來源模型之授權條款與使用限制（包含但不限於）：

tencent/WeDLM-8B-Base：Apache-2.0
shb777/Llama-3.3-8B-Instruct-128K（以及其上游 Llama 3.3 系列）：Llama 3.3 Community License（Hugging Face 顯示為 llama3.3）

如需商用或再散佈（redistribution），請先完整閱讀並確認符合上述授權與可接受使用政策（AUP）。

已知限制與風險（Limitations）

本模型為「Base」取向，未保證具備可靠的指令遵循、拒答策略或安全對齊能力。
任何輸出皆可能包含錯誤、幻覺、偏差或不完整推理；請在高風險場景（醫療、法律、金融、安控等）自行加上外部驗證與防護措施。
合併模型出現退化或不穩定行為（包含長上下文、對齊、格式穩定性等），建議以你的目標工作負載做持續預訓練和微調。

致謝（Acknowledgements）

Tencent WeDLM 團隊（WeDLM-8B）
Llama 3.3 系列與社群釋出/封裝者（allura-forge、shb777）


資料依據（上游授權與模型描述、以及本倉庫合併規格/設定）：WeDLM 模型頁的授權與使用方式 :contentReference[oaicite:0]{index=0}；Llama 3.3 8B Instruct（allura-forge）授權標示 :contentReference[oaicite:1]{index=1}；shb777 128K 版本授權標示 :contentReference[oaicite:2]{index=2}；本倉庫檔案列表與合併檔案（`merge_plan.json`、`merge_stats.json`、`config.json`） :contentReference[oaicite:3]{index=3}
::contentReference[oaicite:4]{index=4}

Downloads last month: 2

Safetensors

Model size

13B params

Tensor type

BF16

Model tree for win10/Llama-3.3-WeDLM-12B-Base-Up-Scaling

Base model

allura-forge/Llama-3.3-8B-Instruct

Finetuned

shb777/Llama-3.3-8B-Instruct-128K

Finetuned

(11)

this model