Spaces:

shi-labs
/

physical-ai-bench-leaderboard

Running

App Files Files Community

Jiannan Huang commited on 11 days ago

Commit

a90bfd0

0 Parent(s):

FIX Height of leaderboard

Browse files

Files changed (8) hide show

.gitignore +2 -0
README.md +77 -0
app.py +628 -0
data/predict-leaderboard.json +301 -0
data/reason-leaderboard.csv +15 -0
inspect_gradio.py +5 -0
requirements.txt +2 -0
signature.txt +1 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ scripts/
2	+ __pycache__/

README.md ADDED Viewed

	@@ -0,0 +1,77 @@

+---
+title: Physical AI Bench Leaderboard
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+app_file: app.py
+pinned: true
+license: mit
+short_description: Benchmark for Physical AI generation and understanding
+sdk_version: 5.43.1
+tags:
+- leaderboard
+- physical-ai
+- world-models
+- autonomous-driving
+- robotics
+- embodied-ai
+---
+# Physical AI Bench Leaderboard
+**Physical AI Bench (PAI-Bench)** is a comprehensive benchmark suite for evaluating physical AI generation and understanding across diverse scenarios including autonomous vehicles, robotics, industrial spaces, and everyday ego-centric environments.
+## Resources
+- 🌐 [GitHub Repository](https://github.com/SHI-Labs/physical-ai-bench)
+- 📊 [Predict Dataset](https://huggingface.co/datasets/shi-labs/physical-ai-bench-predict)
+- 📊 [Transfer Dataset](https://huggingface.co/datasets/shi-labs/physical-ai-bench-transfer)
+- 📊 [Reason Dataset](https://huggingface.co/datasets/shi-labs/physical-ai-bench-reason)
+## Citation
+```bibtex
+@misc{PAIBench2025,
+  title={Physical AI Bench: A Comprehensive Benchmark for Physical AI Generation and Understanding},
+  author={Fengzhe Zhou and Jiannan Huang and Jialuo Li and Humphrey Shi},
+  year={2025},
+  url={https://github.com/SHI-Labs/physical-ai-bench}
+}
+```
+---
+# Configuration
+Most of the variables to change for a default leaderboard are in `src/env.py` (replace the path for your leaderboard) and `src/about.py` (for tasks).
+Results files should have the following format and be stored as json files:
+```json
+{
+    "config": {
+        "model_dtype": "torch.float16", # or torch.bfloat16 or 8bit or 4bit
+        "model_name": "path of the model on the hub: org/model",
+        "model_sha": "revision on the hub",
+    },
+    "results": {
+        "task_name": {
+            "metric_name": score,
+        },
+        "task_name2": {
+            "metric_name": score,
+        }
+    }
+}
+```
+Request files are created automatically by this tool.
+If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.
+# Code logic for more complex edits
+You'll find
+- the main table' columns names and properties in `src/display/utils.py`
+- the logic to read all results and request files, then convert them in dataframe lines, in `src/leaderboard/read_evals.py`, and `src/populate.py`
+- the logic to allow or filter submissions in `src/submission/submit.py` and `src/submission/check_validity.py`

app.py ADDED Viewed

	@@ -0,0 +1,628 @@

+import gradio as gr
+import pandas as pd
+# Your leaderboard name
+TITLE = """<h1 align="center" id="space-title">Physical AI Bench Leaderboard</h1>"""
+# CSS to make the leaderboard full height
+CSS = """
+#predict_leaderboard, #reason_leaderboard {
+    height: auto !important;
+    max-height: none !important;
+}
+#predict_leaderboard .wrap, #reason_leaderboard .wrap {
+    max-height: none !important;
+    height: auto !important;
+}
+#predict_leaderboard .tbody, #reason_leaderboard .tbody {
+    max-height: none !important;
+    height: auto !important;
+    overflow-x: auto !important;
+    overflow-y: hidden !important;
+}
+"""
+# What does your leaderboard evaluate?
+INTRODUCTION_TEXT = """
+**Physical AI Bench (PAI-Bench)** is a comprehensive benchmark suite for evaluating physical AI generation and understanding across diverse scenarios including autonomous vehicles, robotics, industrial spaces, and everyday ego-centric environments.
+"""
+# Which evaluations are you running? how can people reproduce what you have?
+LLM_BENCHMARKS_TEXT = """
+## How it works
+This leaderboard tracks model performance across three core dimensions:
+- **🎨 Generation**: Evaluates world foundation models' ability to predict future states across 1,044 diverse physical scenarios
+- **🔄 Conditional Generation**: Focuses on world model generation with complex control signals, featuring 600 videos across robotic arm operations, autonomous driving, and ego-centric scenes
+- **🧠 Understanding**: Evaluates understanding and reasoning about physical scenes, with 1,214 embodied reasoning scenarios focused on autonomous vehicle actions
+PAI-Bench covers multiple physical AI domains including autonomous driving, robotics, industrial spaces, physics simulations, human interactions, and common sense reasoning.
+### Resources
+- 🌐 [GitHub Repository](https://github.com/SHI-Labs/physical-ai-bench)
+- 📊 [Generation Dataset](https://huggingface.co/datasets/shi-labs/physical-ai-bench-predict)
+- 📊 [Conditional Generation Dataset](https://huggingface.co/datasets/shi-labs/physical-ai-bench-transfer)
+- 📊 [Understanding Dataset](https://huggingface.co/datasets/shi-labs/physical-ai-bench-reason)
+## Reproducibility
+To evaluate your models on PAI-Bench, visit our [GitHub repository](https://github.com/SHI-Labs/physical-ai-bench) for evaluation scripts and detailed instructions.
+## Citation
+If you use Physical AI Bench in your research, please cite:
+```bibtex
+@misc{zhou2025paibenchcomprehensivebenchmarkphysical,
+      title={PAI-Bench: A Comprehensive Benchmark For Physical AI},
+      author={Fengzhe Zhou and Jiannan Huang and Jialuo Li and Deva Ramanan and Humphrey Shi},
+      year={2025},
+      eprint={2512.01989},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2512.01989},
+}
+```
+"""
+# ============================================================================
+# Model Links Utility
+# ============================================================================
+def create_model_link(model_name):
+    """
+    Convert a model name to a markdown link to Hugging Face.
+    Args:
+        model_name: Model name in format "org/model-name" or just a plain name
+    Returns:
+        Markdown formatted link or original name if format doesn't match
+    """
+    if not isinstance(model_name, str):
+        return model_name
+    if '/' in model_name:
+        hf_url = f"https://huggingface.co/{model_name}"
+        display_name = model_name.split('/')[-1]
+        return f"[{display_name}]({hf_url})"
+    return model_name
+# ============================================================================
+# Generation Tab Configuration and Utilities
+# ============================================================================
+# Expected column order (the CSV should already have this order)
+PREDICT_COLUMN_ORDER = [
+    'model',
+    'Overall',
+    'Domain Score',
+    'Quality Score',
+    'Common Sense',
+    'AV',
+    'Robot',
+    'Industry',
+    'Human',
+    'Physics',
+    'Subject Consistency',
+    'Background Consistency',
+    'Motion Smoothness',
+    'Aesthetic Quality',
+    'Image Quality',
+    'Overall Consistency',
+    'I2V Subject',
+    'I2V Background',
+    'params',
+    'activate_params'
+]
+# Columns to hide by default (but still available for filtering/selection)
+PREDICT_HIDDEN_COLUMNS = ['params', 'activate_params']
+# Semantic/Domain dimensions (for selection button)
+PREDICT_DOMAIN_SCORE_DIMENSIONS = [
+    'Domain Score',
+    'Common Sense',
+    'AV',
+    'Robot',
+    'Industry',
+    'Human',
+    'Physics',
+]
+# Quality dimensions (for selection button)
+PREDICT_QUALITY_SCORE_DIMENSIONS = [
+    'Quality Score',
+    'Subject Consistency',
+    'Background Consistency',
+    'Motion Smoothness',
+    'Aesthetic Quality',
+    'Image Quality',
+    'Overall Consistency',
+    'I2V Subject',
+    'I2V Background'
+]
+PREDICT_DESELECTED_COLUMNS = ['Domain Score', 'Quality Score']
+PREDICT_ALL_SELECTED_COLUMNS = [
+    'Domain Score',
+    'Quality Score',
+    'Common Sense',
+    'AV',
+    'Robot',
+    'Industry',
+    'Human',
+    'Physics',
+    'Subject Consistency',
+    'Background Consistency',
+    'Motion Smoothness',
+    'Aesthetic Quality',
+    'Image Quality',
+    'Overall Consistency',
+    'I2V Subject',
+    'I2V Background'
+]
+# Columns that can never be deselected
+PREDICT_NEVER_HIDDEN_COLUMNS = ['model', 'Overall']
+# Columns displayed by default (using renamed column names)
+PREDICT_DEFAULT_DISPLAYED_COLUMNS = PREDICT_NEVER_HIDDEN_COLUMNS + PREDICT_ALL_SELECTED_COLUMNS
+def load_predict_json(json_path):
+    """
+    Load generation leaderboard JSON.
+    The JSON should already be pre-processed by generate_predict_leaderboard.py
+    with correct column names, ordering, sorting, and separate model/url fields.
+    """
+    df = pd.read_json(json_path, orient='records')
+    if 'model' in df.columns and 'url' in df.columns:
+        def create_link(row):
+            if pd.notna(row['url']):
+                display_name = row['model'].split('/')[-1] if '/' in row['model'] else row['model']
+                return f"[{display_name}]({row['url']})"
+            return row['model']
+        df['model'] = df.apply(create_link, axis=1)
+        df = df.drop(columns=['url'])
+    # Format numbers to ensure decimal places (1 decimal for numeric columns)
+    # Numbers should already be scaled to 0-100 by the generation script
+    for col in df.columns:
+        if col not in ['model', 'params', 'activate_params'] and pd.api.types.is_numeric_dtype(df[col]):
+            df[col] = df[col].apply(lambda x: f"{x:.1f}" if pd.notna(x) else x)
+    return df
+def select_predict_domain_score():
+    """Return domain score for checkbox selection"""
+    return gr.update(value=PREDICT_DOMAIN_SCORE_DIMENSIONS)
+def select_predict_quality_score():
+    """Return quality score for checkbox selection"""
+    return gr.update(value=PREDICT_QUALITY_SCORE_DIMENSIONS)
+def deselect_predict_all():
+    """Deselect all dimensions"""
+    return gr.update(value=PREDICT_DESELECTED_COLUMNS)
+def select_predict_all():
+    """Select all dimensions"""
+    return gr.update(value=PREDICT_ALL_SELECTED_COLUMNS)
+def on_predict_dimension_selection_change(selected_columns, full_df):
+    """Handle dimension selection changes and update the dataframe"""
+    # Always include model and Overall columns
+    present_columns = ['model', 'Overall']
+    # Add selected columns
+    for col in selected_columns:
+        if col not in present_columns and col in full_df.columns:
+            present_columns.append(col)
+    # Filter dataframe to show only selected columns
+    updated_data = full_df[present_columns]
+    # Determine datatypes
+    datatypes = []
+    for col in present_columns:
+        if col == 'model':
+            datatypes.append('markdown')
+        elif col in ['params', 'activate_params']:
+            datatypes.append('number')
+        else:
+            datatypes.append('str')
+    return gr.update(value=updated_data, datatype=datatypes, headers=present_columns)
+def init_predict_leaderboard(dataframe):
+    """Initialize the Generation leaderboard with given dataframe"""
+    if dataframe is None or dataframe.empty:
+        raise ValueError("Leaderboard DataFrame is empty or None.")
+    # Get columns that exist in the dataframe
+    available_default_cols = [col for col in PREDICT_DEFAULT_DISPLAYED_COLUMNS if col in dataframe.columns]
+    # Filter dataframe to show only default columns initially
+    display_df = dataframe[available_default_cols]
+    # Determine datatypes dynamically
+    datatypes = []
+    for col in display_df.columns:
+        if col == 'model':
+            datatypes.append('markdown')
+        elif col in ['params', 'activate_params']:
+            datatypes.append('number')
+        else:
+            datatypes.append('str')  # All numeric columns are now formatted as strings
+    # Create the UI components
+    with gr.Row():
+        with gr.Column(scale=1):
+            domain_score_btn = gr.Button("Domain Score", size="md")
+            quality_score_btn = gr.Button("Quality Score", size="md")
+            select_all_btn = gr.Button("Select All", size="md")
+            deselect_btn = gr.Button("Deselect All", size="md")
+        with gr.Column(scale=4):
+            # Get all dimension columns (exclude model, Overall, scores, and params)
+            dimension_choices = [col for col in dataframe.columns
+                                if col not in PREDICT_NEVER_HIDDEN_COLUMNS + PREDICT_HIDDEN_COLUMNS]
+            checkbox_group = gr.CheckboxGroup(
+                choices=dimension_choices,
+                value=[col for col in PREDICT_DEFAULT_DISPLAYED_COLUMNS if col in dimension_choices],
+                label="Evaluation Dimensions",
+                interactive=True,
+            )
+    data_component = gr.Dataframe(
+        value=display_df,
+        headers=list(display_df.columns),
+        datatype=datatypes,
+        interactive=False,
+        visible=True,
+        wrap=False,
+        column_widths=["320px"] + ["200px"] * (len(display_df.columns) - 1),
+        pinned_columns=1,
+        elem_id="predict_leaderboard",
+        max_height=10000,
+    )
+    # Setup event handlers
+    domain_score_btn.click(
+        select_predict_domain_score,
+        inputs=None,
+        outputs=[checkbox_group]
+    ).then(
+        fn=on_predict_dimension_selection_change,
+        inputs=[checkbox_group, gr.State(dataframe)],
+        outputs=data_component
+    )
+    quality_score_btn.click(
+        select_predict_quality_score,
+        inputs=None,
+        outputs=[checkbox_group]
+    ).then(
+        fn=on_predict_dimension_selection_change,
+        inputs=[checkbox_group, gr.State(dataframe)],
+        outputs=data_component
+    )
+    deselect_btn.click(
+        deselect_predict_all,
+        inputs=None,
+        outputs=[checkbox_group]
+    ).then(
+        fn=on_predict_dimension_selection_change,
+        inputs=[checkbox_group, gr.State(dataframe)],
+        outputs=data_component
+    )
+    select_all_btn.click(
+        select_predict_all,
+        inputs=None,
+        outputs=[checkbox_group]
+    ).then(
+        fn=on_predict_dimension_selection_change,
+        inputs=[checkbox_group, gr.State(dataframe)],
+        outputs=data_component
+    )
+    checkbox_group.change(
+        fn=on_predict_dimension_selection_change,
+        inputs=[checkbox_group, gr.State(dataframe)],
+        outputs=data_component
+    )
+    return data_component
+# ============================================================================
+# Understanding Tab Configuration and Utilities
+# ============================================================================
+# Column name mapping for display
+REASON_COLUMN_MAPPING = {
+    'Physical world': 'Physics'
+}
+# Desired column order
+REASON_COLUMN_ORDER = [
+    'model',
+    'Overall',
+    'Common Sense',
+    'Embodied Reasoning',
+    'Space',
+    'Time',
+    'Physics',
+    'BridgeData V2',
+    'RoboVQA',
+    'RoboFail',
+    'Agibot',
+    'HoloAssist',
+    'AV',
+    'params',
+    'activate_params'
+]
+# Columns to hide by default (but still available for filtering/selection)
+REASON_HIDDEN_COLUMNS = ['params', 'activate_params']
+# Reasoning dimensions (for selection button)
+REASON_COMMON_SENSE_DIMENSIONS = [
+    'Common Sense',
+    'Space',
+    'Time',
+    'Physics',
+]
+# Domain dimensions (for selection button)
+REASON_EMBODIED_REASONING_DIMENSIONS = [
+    'Embodied Reasoning',
+    'Space',
+    'Time',
+    'Physics',
+    'BridgeData V2',
+    'RoboVQA',
+    'RoboFail',
+    'Agibot',
+    'HoloAssist',
+    'AV',
+]
+REASON_DESELECTED_COLUMNS = [
+    'Common Sense',
+    'Embodied Reasoning',
+]
+REASON_ALL_SELECTED_COLUMNS = [
+    'Common Sense',
+    'Embodied Reasoning',
+    'Space',
+    'Time',
+    'Physics',
+    'BridgeData V2',
+    'RoboVQA',
+    'RoboFail',
+    'Agibot',
+    'HoloAssist',
+    'AV',
+]
+# Columns that can never be deselected
+REASON_NEVER_HIDDEN_COLUMNS = ['model', 'Overall']
+# Columns displayed by default (using renamed column names)
+REASON_DEFAULT_DISPLAYED_COLUMNS = REASON_NEVER_HIDDEN_COLUMNS + REASON_ALL_SELECTED_COLUMNS
+def load_reason_csv(csv_path):
+    """Load CSV and apply column mapping and ordering"""
+    df = pd.read_csv(csv_path)
+    # Apply column mapping
+    df = df.rename(columns=REASON_COLUMN_MAPPING)
+    # Reorder columns (only keep columns that exist in the dataframe)
+    available_cols = [col for col in REASON_COLUMN_ORDER if col in df.columns]
+    df = df[available_cols]
+    # Convert model names to HuggingFace links
+    if 'model' in df.columns:
+        df['model'] = df['model'].apply(create_model_link)
+    # Format numbers to ensure decimal places (1 decimal for integers)
+    for col in df.columns:
+        if col not in ['model', 'params', 'activate_params'] and pd.api.types.is_numeric_dtype(df[col]):
+            df[col] = df[col].apply(lambda x: f"{x:.1f}" if pd.notna(x) else x)
+    return df
+def select_reason_common_sense_dimensions():
+    """Return reasoning dimensions for checkbox selection"""
+    return gr.update(value=REASON_COMMON_SENSE_DIMENSIONS)
+def select_reason_embodied_reasoning_dimensions():
+    """Return domain dimensions for checkbox selection"""
+    return gr.update(value=REASON_EMBODIED_REASONING_DIMENSIONS)
+def deselect_reason_all():
+    """Deselect all dimensions"""
+    return gr.update(value=REASON_DESELECTED_COLUMNS)
+def select_reason_all():
+    """Select all dimensions"""
+    return gr.update(value=REASON_ALL_SELECTED_COLUMNS)
+def on_reason_dimension_selection_change(selected_columns, full_df):
+    """Handle dimension selection changes and update the dataframe"""
+    # Always include model and Overall columns
+    present_columns = ['model', 'Overall']
+    # Add selected columns
+    for col in selected_columns:
+        if col not in present_columns and col in full_df.columns:
+            present_columns.append(col)
+    # Filter dataframe to show only selected columns
+    updated_data = full_df[present_columns]
+    # Determine datatypes
+    datatypes = []
+    for col in present_columns:
+        if col == 'model':
+            datatypes.append('markdown')
+        elif col in ['params', 'activate_params']:
+            datatypes.append('number')
+        else:
+            datatypes.append('str')
+    return gr.update(value=updated_data, datatype=datatypes, headers=present_columns)
+def init_reason_leaderboard(dataframe):
+    """Initialize the Understanding leaderboard with given dataframe"""
+    if dataframe is None or dataframe.empty:
+        raise ValueError("Leaderboard DataFrame is empty or None.")
+    # Get columns that exist in the dataframe
+    available_default_cols = [col for col in REASON_DEFAULT_DISPLAYED_COLUMNS if col in dataframe.columns]
+    # Filter dataframe to show only default columns initially
+    display_df = dataframe[available_default_cols]
+    # Determine datatypes dynamically
+    datatypes = []
+    for col in display_df.columns:
+        if col == 'model':
+            datatypes.append('markdown')
+        elif col in ['params', 'activate_params']:
+            datatypes.append('number')
+        else:
+            datatypes.append('str')  # All numeric columns are now formatted as strings
+    # Create the UI components
+    with gr.Row():
+        with gr.Column(scale=1):
+            common_sense_btn = gr.Button("Common Sense", size="md")
+            embodied_reasoning_btn = gr.Button("Embodied Reasoning", size="md")
+            select_all_btn = gr.Button("Select All", size="md")
+            deselect_btn = gr.Button("Deselect All", size="md")
+        with gr.Column(scale=4):
+            # Get all dimension columns (exclude model, Overall, and params)
+            dimension_choices = [col for col in dataframe.columns
+                                if col not in REASON_NEVER_HIDDEN_COLUMNS + REASON_HIDDEN_COLUMNS]
+            checkbox_group = gr.CheckboxGroup(
+                choices=dimension_choices,
+                value=[col for col in REASON_DEFAULT_DISPLAYED_COLUMNS if col in dimension_choices],
+                label="Evaluation Dimensions",
+                interactive=True,
+            )
+    data_component = gr.Dataframe(
+        value=display_df,
+        headers=list(display_df.columns),
+        datatype=datatypes,
+        interactive=False,
+        visible=True,
+        wrap=False,  # Allow horizontal scrolling, don't wrap content
+        column_widths=["320px"] + ["200px"] * (len(display_df.columns) - 1),
+        pinned_columns=1,
+        elem_id="reason_leaderboard",
+        max_height=10000,
+    )
+    # Setup event handlers
+    common_sense_btn.click(
+        select_reason_common_sense_dimensions,
+        inputs=None,
+        outputs=[checkbox_group]
+    ).then(
+        fn=on_reason_dimension_selection_change,
+        inputs=[checkbox_group, gr.State(dataframe)],
+        outputs=data_component
+    )
+    embodied_reasoning_btn.click(
+        select_reason_embodied_reasoning_dimensions,
+        inputs=None,
+        outputs=[checkbox_group]
+    ).then(
+        fn=on_reason_dimension_selection_change,
+        inputs=[checkbox_group, gr.State(dataframe)],
+        outputs=data_component
+    )
+    deselect_btn.click(
+        deselect_reason_all,
+        inputs=None,
+        outputs=[checkbox_group]
+    ).then(
+        fn=on_reason_dimension_selection_change,
+        inputs=[checkbox_group, gr.State(dataframe)],
+        outputs=data_component
+    )
+    select_all_btn.click(
+        select_reason_all,
+        inputs=None,
+        outputs=[checkbox_group]
+    ).then(
+        fn=on_reason_dimension_selection_change,
+        inputs=[checkbox_group, gr.State(dataframe)],
+        outputs=data_component
+    )
+    checkbox_group.change(
+        fn=on_reason_dimension_selection_change,
+        inputs=[checkbox_group, gr.State(dataframe)],
+        outputs=data_component
+    )
+    return data_component
+# ============================================================================
+# Main Application
+# ============================================================================
+demo = gr.Blocks()
+with demo:
+    gr.HTML(TITLE)
+    gr.Markdown(INTRODUCTION_TEXT, elem_classes="markdown-text")
+    with gr.Tabs(elem_classes="tab-buttons") as tabs:
+        with gr.TabItem("🎨 Generation", elem_id="predict-tab", id=0):
+            predict_df = load_predict_json("data/predict-leaderboard.json")
+            predict_leaderboard = init_predict_leaderboard(predict_df)
+        with gr.TabItem("🔄 Conditional Generation", elem_id="transfer-tab", id=1):
+            gr.Markdown("## Coming Soon", elem_classes="markdown-text")
+        with gr.TabItem("🧠 Understanding", elem_id="reason-tab", id=2):
+            reason_df = load_reason_csv("data/reason-leaderboard.csv")
+            reason_leaderboard = init_reason_leaderboard(reason_df)
+        with gr.TabItem("ℹ️ About", elem_id="about-tab", id=3):
+            gr.Markdown(LLM_BENCHMARKS_TEXT, elem_classes="markdown-text")
+demo.launch(css=CSS)

data/predict-leaderboard.json ADDED Viewed

	@@ -0,0 +1,301 @@

+[
+  {
+    "model":"Veo-3",
+    "url":"https:\/\/deepmind.google\/models\/veo",
+    "Overall":82.1,
+    "Domain Score":86.7,
+    "Quality Score":77.6,
+    "Common Sense":94.4,
+    "AV":68.7,
+    "Robot":86.9,
+    "Industry":89.7,
+    "Human":84.4,
+    "Physics":91.6,
+    "Subject Consistency":91.4,
+    "Background Consistency":93.1,
+    "Motion Smoothness":99.2,
+    "Aesthetic Quality":51.9,
+    "Image Quality":69.8,
+    "Overall Consistency":21.7,
+    "I2V Subject":97.0,
+    "I2V Background":96.9,
+    "params":null,
+    "activate_params":null
+  },
+  {
+    "model":"nvidia\/Cosmos-Predict2.5-2B",
+    "url":"https:\/\/huggingface.co\/nvidia\/Cosmos-Predict2.5-2B",
+    "Overall":81.0,
+    "Domain Score":84.0,
+    "Quality Score":77.9,
+    "Common Sense":94.1,
+    "AV":66.1,
+    "Robot":80.8,
+    "Industry":87.8,
+    "Human":81.4,
+    "Physics":93.9,
+    "Subject Consistency":92.5,
+    "Background Consistency":94.2,
+    "Motion Smoothness":99.1,
+    "Aesthetic Quality":52.4,
+    "Image Quality":70.8,
+    "Overall Consistency":20.1,
+    "I2V Subject":96.6,
+    "I2V Background":97.4,
+    "params":2.0,
+    "activate_params":2.0
+  },
+  {
+    "model":"Wan-AI\/Wan2.2-I2V-A14B",
+    "url":"https:\/\/huggingface.co\/Wan-AI\/Wan2.2-I2V-A14B",
+    "Overall":80.6,
+    "Domain Score":84.1,
+    "Quality Score":77.2,
+    "Common Sense":93.2,
+    "AV":66.3,
+    "Robot":81.7,
+    "Industry":89.2,
+    "Human":82.1,
+    "Physics":91.8,
+    "Subject Consistency":91.6,
+    "Background Consistency":93.7,
+    "Motion Smoothness":98.3,
+    "Aesthetic Quality":51.2,
+    "Image Quality":69.6,
+    "Overall Consistency":20.4,
+    "I2V Subject":96.0,
+    "I2V Background":96.6,
+    "params":14.0,
+    "activate_params":14.0
+  },
+  {
+    "model":"Wan-AI\/Wan2.2-TI2V-5B",
+    "url":"https:\/\/huggingface.co\/Wan-AI\/Wan2.2-TI2V-5B",
+    "Overall":80.4,
+    "Domain Score":83.4,
+    "Quality Score":77.4,
+    "Common Sense":93.1,
+    "AV":65.2,
+    "Robot":79.3,
+    "Industry":88.4,
+    "Human":83.0,
+    "Physics":91.5,
+    "Subject Consistency":91.8,
+    "Background Consistency":93.7,
+    "Motion Smoothness":98.8,
+    "Aesthetic Quality":51.9,
+    "Image Quality":69.9,
+    "Overall Consistency":20.3,
+    "I2V Subject":95.9,
+    "I2V Background":96.7,
+    "params":5.0,
+    "activate_params":5.0
+  },
+  {
+    "model":"Wan-AI\/Wan2.1-I2V-14B-720P",
+    "url":"https:\/\/huggingface.co\/Wan-AI\/Wan2.1-I2V-14B-720P",
+    "Overall":79.7,
+    "Domain Score":82.7,
+    "Quality Score":76.8,
+    "Common Sense":90.6,
+    "AV":66.9,
+    "Robot":80.1,
+    "Industry":89.7,
+    "Human":80.1,
+    "Physics":88.7,
+    "Subject Consistency":90.0,
+    "Background Consistency":93.1,
+    "Motion Smoothness":98.1,
+    "Aesthetic Quality":51.5,
+    "Image Quality":70.1,
+    "Overall Consistency":20.4,
+    "I2V Subject":95.2,
+    "I2V Background":96.0,
+    "params":14.0,
+    "activate_params":14.0
+  },
+  {
+    "model":"MAGI\/MAGI-1-24B",
+    "url":"https:\/\/huggingface.co\/sand-ai\/MAGI-1",
+    "Overall":78.5,
+    "Domain Score":80.5,
+    "Quality Score":76.5,
+    "Common Sense":90.6,
+    "AV":61.8,
+    "Robot":73.5,
+    "Industry":84.1,
+    "Human":79.8,
+    "Physics":87.7,
+    "Subject Consistency":90.0,
+    "Background Consistency":92.4,
+    "Motion Smoothness":99.0,
+    "Aesthetic Quality":50.2,
+    "Image Quality":64.2,
+    "Overall Consistency":21.4,
+    "I2V Subject":96.8,
+    "I2V Background":97.9,
+    "params":24.0,
+    "activate_params":24.0
+  },
+  {
+    "model":"THUDM\/CogVideoX1.5-5B-I2V",
+    "url":"https:\/\/huggingface.co\/THUDM\/CogVideoX1.5-5B-I2V",
+    "Overall":78.3,
+    "Domain Score":80.1,
+    "Quality Score":76.6,
+    "Common Sense":89.1,
+    "AV":59.7,
+    "Robot":73.0,
+    "Industry":84.4,
+    "Human":79.2,
+    "Physics":91.8,
+    "Subject Consistency":91.6,
+    "Background Consistency":93.9,
+    "Motion Smoothness":98.5,
+    "Aesthetic Quality":50.0,
+    "Image Quality":66.5,
+    "Overall Consistency":21.2,
+    "I2V Subject":95.0,
+    "I2V Background":96.1,
+    "params":5.0,
+    "activate_params":5.0
+  },
+  {
+    "model":"THUDM\/CogVideoX-5B-I2V",
+    "url":"https:\/\/huggingface.co\/THUDM\/CogVideoX-5B-I2V",
+    "Overall":77.9,
+    "Domain Score":79.5,
+    "Quality Score":76.3,
+    "Common Sense":87.7,
+    "AV":58.0,
+    "Robot":74.0,
+    "Industry":84.4,
+    "Human":79.0,
+    "Physics":90.2,
+    "Subject Consistency":91.4,
+    "Background Consistency":93.4,
+    "Motion Smoothness":98.0,
+    "Aesthetic Quality":51.2,
+    "Image Quality":64.6,
+    "Overall Consistency":21.3,
+    "I2V Subject":94.1,
+    "I2V Background":95.9,
+    "params":5.0,
+    "activate_params":5.0
+  },
+  {
+    "model":"Lightricks\/LTX-Video-13B",
+    "url":"https:\/\/huggingface.co\/Lightricks\/LTX-Video",
+    "Overall":77.9,
+    "Domain Score":78.4,
+    "Quality Score":77.4,
+    "Common Sense":88.9,
+    "AV":55.3,
+    "Robot":70.1,
+    "Industry":82.7,
+    "Human":78.3,
+    "Physics":90.1,
+    "Subject Consistency":90.6,
+    "Background Consistency":93.5,
+    "Motion Smoothness":99.0,
+    "Aesthetic Quality":53.5,
+    "Image Quality":69.5,
+    "Overall Consistency":21.4,
+    "I2V Subject":95.7,
+    "I2V Background":96.0,
+    "params":13.0,
+    "activate_params":13.0
+  },
+  {
+    "model":"Tencent\/HunyuanVideo-I2V",
+    "url":"https:\/\/huggingface.co\/Tencent\/HunyuanVideo-I2V",
+    "Overall":77.4,
+    "Domain Score":76.8,
+    "Quality Score":78.0,
+    "Common Sense":87.4,
+    "AV":56.3,
+    "Robot":67.7,
+    "Industry":83.0,
+    "Human":75.5,
+    "Physics":88.2,
+    "Subject Consistency":94.3,
+    "Background Consistency":95.3,
+    "Motion Smoothness":99.5,
+    "Aesthetic Quality":52.1,
+    "Image Quality":65.2,
+    "Overall Consistency":21.5,
+    "I2V Subject":98.6,
+    "I2V Background":97.6,
+    "params":null,
+    "activate_params":null
+  },
+  {
+    "model":"MAGI\/MAGI-1-4.5B",
+    "url":"https:\/\/huggingface.co\/sand-ai\/MAGI-1",
+    "Overall":76.9,
+    "Domain Score":77.4,
+    "Quality Score":76.3,
+    "Common Sense":87.5,
+    "AV":56.3,
+    "Robot":71.6,
+    "Industry":79.8,
+    "Human":76.0,
+    "Physics":88.9,
+    "Subject Consistency":92.1,
+    "Background Consistency":93.3,
+    "Motion Smoothness":99.0,
+    "Aesthetic Quality":50.4,
+    "Image Quality":61.8,
+    "Overall Consistency":21.6,
+    "I2V Subject":94.5,
+    "I2V Background":98.1,
+    "params":4.5,
+    "activate_params":4.5
+  },
+  {
+    "model":"Lightricks\/LTX-Video-2B",
+    "url":"https:\/\/huggingface.co\/Lightricks\/LTX-Video",
+    "Overall":76.9,
+    "Domain Score":76.6,
+    "Quality Score":77.1,
+    "Common Sense":87.3,
+    "AV":53.6,
+    "Robot":67.1,
+    "Industry":81.5,
+    "Human":77.1,
+    "Physics":87.6,
+    "Subject Consistency":89.2,
+    "Background Consistency":92.7,
+    "Motion Smoothness":98.7,
+    "Aesthetic Quality":53.2,
+    "Image Quality":71.3,
+    "Overall Consistency":21.1,
+    "I2V Subject":95.0,
+    "I2V Background":95.9,
+    "params":2.0,
+    "activate_params":2.0
+  },
+  {
+    "model":"Doubiiu\/DynamiCrafter_1024",
+    "url":"https:\/\/huggingface.co\/Doubiiu\/DynamiCrafter_1024",
+    "Overall":69.7,
+    "Domain Score":65.6,
+    "Quality Score":73.7,
+    "Common Sense":75.2,
+    "AV":43.4,
+    "Robot":55.0,
+    "Industry":72.5,
+    "Human":64.1,
+    "Physics":83.8,
+    "Subject Consistency":91.1,
+    "Background Consistency":92.5,
+    "Motion Smoothness":94.9,
+    "Aesthetic Quality":51.5,
+    "Image Quality":68.0,
+    "Overall Consistency":21.2,
+    "I2V Subject":84.5,
+    "I2V Background":86.2,
+    "params":null,
+    "activate_params":null
+  }
+]

data/reason-leaderboard.csv ADDED Viewed

	@@ -0,0 +1,15 @@

+model,Overall,Common Sense,Embodied Reasoning,Space,Time,Physics,BridgeData V2,RoboVQA,RoboFail,Agibot,HoloAssist,AV,params,activate_params
+GPT-5,70.0,72.7,67.4,67.5,72.8,74.3,53.0,90.9,68.0,55.0,73.0,62.0,,
+Qwen/Qwen3-VL-235B-A22B-Instruct,64.8,65.2,64.4,56.2,69.8,62.4,42.0,93.6,71.0,45.0,76.0,56.0,235.0,22.0
+Qwen/Qwen3-VL-30B-A3B-Instruct,60.6,59.9,61.3,52.5,62.1,59.7,36.0,89.1,67.0,43.0,81.0,49.0,30.0,3.0
+Qwen/Qwen2.5-VL-72B-Instruct,56.8,57.9,55.7,56.2,62.8,52.2,35.0,90.9,73.0,35.0,58.0,39.0,72.0,72.0
+OpenGVLab/InternVL3_5-38B,55.8,55.8,55.7,57.5,60.4,49.1,36.0,81.8,67.0,44.0,71.0,32.0,38.0,38.0
+nvidia/Cosmos-Reason1-7B,54.3,50.7,57.9,57.5,53.7,44.2,41.0,91.8,65.0,42.0,57.0,47.0,7.0,7.0
+GPT-4o,53.7,56.3,51.1,55.0,55.0,58.4,40.0,56.4,65.0,37.0,65.0,43.0,,
+Qwen/Qwen2.5-VL-32B-Instruct,51.9,53.8,50.0,50.0,61.1,45.6,32.0,90.0,52.0,34.0,55.0,33.0,32.0,32.0
+OpenGVLab/InternVL3_5-8B,50.5,50.5,50.5,48.8,54.7,45.6,32.0,77.3,66.0,38.0,49.0,38.0,8.0,8.0
+Qwen/Qwen2.5-VL-7B-Instruct,50.3,47.7,53.0,47.5,55.4,37.6,33.0,83.6,62.0,44.0,47.0,45.0,7.0,7.0
+OpenGVLab/InternVL3_5-14B,49.7,50.3,49.0,52.5,52.0,47.3,26.0,80.0,67.0,28.0,54.0,36.0,14.0,14.0
+OpenGVLab/InternVL3_5-30B-A3B,49.5,49.5,49.5,47.5,54.4,43.8,37.0,78.2,60.0,27.0,55.0,37.0,30.0,3.0
+Qwen/Qwen2.5-VL-3B-Instruct,48.1,47.4,48.9,47.5,50.7,42.9,31.0,82.7,63.0,36.0,48.0,29.0,3.0,3.0
+zai-org/GLM-4.5V,45.5,46.0,44.9,46.2,50.7,39.8,26.0,83.6,69.0,25.0,24.0,38.0,,

inspect_gradio.py ADDED Viewed

	@@ -0,0 +1,5 @@

+import gradio as gr
+import inspect
+with open("signature.txt", "w") as f:
+    f.write(str(inspect.signature(gr.Dataframe.__init__)))

requirements.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ gradio
2	+ pandas

signature.txt ADDED Viewed

	@@ -0,0 +1 @@