Juan Pablo Mejia Gómez's picture

1 6

Juan Pablo Mejia Gómez

Juanpa0128

·

Juanpa0128j

AI & ML interests

LLMs, Agents, Transformers, ML, and many more...

Recent Activity

reacted to codelion's post with 🔥 about 23 hours ago

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m

liked a model about 23 hours ago

zai-org/GLM-4.7

liked a model 3 days ago

meta-llama/Llama-3.2-3B-Instruct

View all activity

Organizations

None yet

models 0

None public yet

datasets 0

None public yet