Hub documentation

Accessing Benchmark Leaderboard Data

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Accessing Benchmark Leaderboard Data

Benchmark datasets on the Hub contain leaderboards ranking models by their evaluation scores. You can access this data programmatically to analyse, build dashboards or tools on top of it.

Discovering official benchmarks

Use huggingface_hub to find all official benchmark datasets:

from huggingface_hub import HfApi

api = HfApi()
for ds in api.list_datasets(benchmark=True):
    print(ds.id)

Or via the REST API directly (useful for agents and scripting):

GET https://huggingface.co/api/datasets?filter=benchmark:official

Getting leaderboard rankings

The leaderboard API returns ranked model scores for a benchmark dataset:

GET https://huggingface.co/api/datasets/{dataset_id}/leaderboard

Use get_dataset_leaderboard to fetch ranked model scores as typed DatasetLeaderboardEntry objects:

from huggingface_hub import HfApi

api = HfApi()
leaderboard = api.get_dataset_leaderboard("SWE-bench/SWE-bench_Verified")

for entry in leaderboard[:5]:
    print(f"#{entry.rank} {entry.model_id}: {entry.value}")

huggingface_hub uses your cached token by default. For gated benchmark datasets, make sure you are logged in (hf auth login) or pass a token explicitly:

leaderboard = api.get_dataset_leaderboard("gated/benchmark", token="hf_...")

Curl one-liner for quick access (useful for agents and scripting):

curl https://huggingface.co/api/datasets/cais/hle/leaderboard \
  --header "Authorization: Bearer $(cat ~/.cache/huggingface/token)" | jq .

Response fields

Each DatasetLeaderboardEntry contains:

FieldDescription
rankPosition on the leaderboard
model_idFull model ID (e.g. Qwen/Qwen3.5-397B-A17B)
valueThe benchmark score
verifiedWhether the result has been independently verified
authorA User or Organization object
sourceWhere the result was submitted from (model card, external, etc.)
filenamePath to the eval results YAML file (e.g. .eval_results/swe_bench_verified.yaml)
pull_requestPR number for the submission on the benchmark dataset repo
notesOptional notes associated with the entry

Pre-aggregated multi-benchmark dataset

If you want scores from multiple benchmarks in a single file, the OpenEvals/leaderboard-data dataset aggregates scores across official benchmarks into one Parquet file:

You can load it directly with pandas using the hf:// path:

import pandas as pd

df = pd.read_parquet(
    "hf://datasets/OpenEvals/leaderboard-data/data/train-00000-of-00001.parquet"
)
print(df[["model_name", "provider", "aime2026_score", "mmluPro_score"]].head())

This is the fastest way to get a cross-benchmark view without calling multiple API endpoints.

Enriching with model metadata

Use huggingface_hub to enrich leaderboard data with release dates, parameter counts, and other metadata:

from huggingface_hub import HfApi

api = HfApi()
info = api.model_info("Qwen/Qwen3.5-397B-A17B")

print(f"Released: {info.created_at}")
print(f"Parameters: {info.safetensors.total / 1e9:.1f}B" if info.safetensors else "")

Model-centric view: eval results per model

The leaderboard API gives a dataset-centric view (all models on one benchmark). For the reverse — all benchmark scores for a single model — use model_info with expand=["evalResults"]:

from huggingface_hub import HfApi

api = HfApi()
info = api.model_info("Qwen/Qwen3.5-397B-A17B", expand=["evalResults"])

for result in info.eval_results:
    print(f"{result.dataset_id}: {result.value}")

This returns EvalResultEntry objects parsed from the model’s .eval_results/ files.

Example: building on leaderboard data

The Benchmark Leaderboard Race Space combines these data sources to create an animated visualization of how model rankings evolve over time. You can build your own analyses and visualizations on top of this data — see the source code for a complete example.

Embed a leaderboard in a webpage

You can embed a benchmark dataset’s leaderboard directly into your own webpage using an iframe.

The URL to use is https://huggingface.co/datasets/<namespace>/<dataset-name>/embed/leaderboard, where <namespace> is the owner of the benchmark dataset (user or organization) and <dataset-name> is the name of the dataset.

<iframe
  src="https://huggingface.co/datasets/cais/hle/embed/leaderboard"
  frameborder="0"
  width="100%"
  height="560px"
></iframe>

Parameters

You can configure the embedded leaderboard by passing query parameters in the iframe URL:

ParameterDescription
leaderboard_task_idID of the task to display, as defined in the benchmark’s eval.yaml (e.g. gpqa_diamond). Defaults to the first task.
eval_resultModel ID to highlight on the leaderboard (e.g. meta-llama/Llama-3.1-8B).
leaderboard_max_paramsFilter rows by maximum parameter count. Accepts one of the following values: 1B, 3B, 6B, 12B, 32B, 128B or 500B.
leaderboard_is_expandedSet to true to render the leaderboard fully expanded instead of collapsed.

For example, to embed the HLE leaderboard with the table expanded and a specific model highlighted:

<iframe
  src="https://huggingface.co/datasets/cais/hle/embed/leaderboard?leaderboard_is_expanded=true&eval_result=meta-llama/Llama-3.1-8B"
  frameborder="0"
  width="100%"
  height="560px"
></iframe>

The embed is only available for official benchmark datasets that have evaluation results.

Related

Update on GitHub