Stanford CRFM

university

AI & ML interests

None defined yet.

Recent Activity

yifanmai updated a dataset 5 days ago

stanford-crfm/arabic-enterprise

yifanmai authored a paper 13 days ago

VHELM: A Holistic Evaluation of Vision Language Models

yifanmai authored a paper 13 days ago

AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies

View all activity

updated a dataset 5 days ago

stanford-crfm/arabic-enterprise

Viewer • Updated 5 days ago • 721 • 56

authored 9 papers 13 days ago

VHELM: A Holistic Evaluation of Vision Language Models

Paper • 2410.07112 • Published Oct 9, 2024 • 3

AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies

Paper • 2407.17436 • Published Jul 11, 2024

Image2Struct: Benchmarking Structure Extraction for Vision-Language Models

Paper • 2410.22456 • Published Oct 29, 2024

SEA-HELM: Southeast Asian Holistic Evaluation of Language Models

Paper • 2502.14301 • Published Feb 20, 2025 • 3

AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

Paper • 2503.05731 • Published Feb 19, 2025 • 3

Judging LLMs on a Simplex

Paper • 2505.21972 • Published May 28, 2025 • 1

AHELM: A Holistic Evaluation of Audio-Language Models

Paper • 2508.21376 • Published Aug 29, 2025 • 9

Structured Prompting Enables More Robust Evaluation of Language Models

Paper • 2511.20836 • Published Nov 25, 2025

Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation

Paper • 2510.11977 • Published Oct 13, 2025

published a dataset 14 days ago

stanford-crfm/arabic-enterprise

Viewer • Updated 5 days ago • 721 • 56

submitted a paper to Daily Papers 28 days ago

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Paper • 2604.04759 • Published 29 days ago • 24

authored a paper about 1 month ago

Problems with Chinchilla Approach 2: Systematic Biases in IsoFLOP Parabola Fits

Paper • 2603.22339 • Published Mar 21 • 5

updated a dataset 3 months ago

stanford-crfm/helm-scenarios

Preview • Updated Jan 26 • 289 • 2

dlwh

in stanford-crfm/music-large-800k 5 months ago

Missing tokenizer file

#1 opened 7 months ago by

published a dataset 6 months ago

stanford-crfm/CoReBench_v1

Updated Oct 23, 2025 • 4

authored 3 papers 8 months ago

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Paper • 2505.04601 • Published May 7, 2025 • 29

Autoregressive Pretraining with Mamba in Vision

Paper • 2406.07537 • Published Jun 11, 2024

AHELM: A Holistic Evaluation of Audio-Language Models

Paper • 2508.21376 • Published Aug 29, 2025 • 9

authored a paper 9 months ago

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

Paper • 2411.19799 • Published Nov 29, 2024 • 17