Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
1
Yiming Tang
tangyiming
Follow
AI & ML interests
None yet
Recent Activity
new
activity
about 1 month ago
Qwen/Qwen3-Next-80B-A3B-Instruct:
Megatron Swift dpo training on Qwen/Qwen3-Next-80B-A3B-Instruct always always return nan loss. Why?
View all activity
Organizations
None yet
models
0
None public yet
datasets
0
None public yet