Dataset Quality LM πŸ“ŠπŸ§ͺ

DatasetQualityLM is an AI system that evaluates datasets for bias, data leakage, noise, and real-world deployability before model training.

It helps teams detect hidden dataset risks early, improving model reliability, fairness, and production readiness.


πŸ” What Problem Does It Solve?

Many ML failures come from bad data, not bad models.

DatasetQualityLM answers:

  • Is this dataset biased?
  • Does it contain data leakage?
  • How noisy or inconsistent is it?
  • Is it safe to deploy models trained on it?

✨ Key Features

  • βš–οΈ Bias detection (demographic & distributional)
  • πŸ”“ Target & feature leakage detection
  • πŸ”Š Noise and missing-value analysis
  • πŸš€ Deployability scoring (single quality score)
  • 🧠 Explainable, rule-based analysis
  • πŸ€— Hugging Face–ready pipeline
  • πŸŽ›οΈ Gradio demo included
  • πŸ§ͺ Unit-tested core components

πŸ“‚ Project Structure

dataset-quality-lm/
β”œβ”€β”€ config/
β”œβ”€β”€ data/
β”œβ”€β”€ src/
β”œβ”€β”€ training/
β”œβ”€β”€ pipelines/
β”œβ”€β”€ scripts/
β”œβ”€β”€ tests/
β”œβ”€β”€ notebooks/
β”œβ”€β”€ app.py
β”œβ”€β”€ README.md
β”œβ”€β”€ model_card.md
β”œβ”€β”€ requirements.txt
└── LICENSE

βš™οΈ Installation

pip install -r requirements.txt

πŸš€ Quick Usage

from src.inference import DatasetQualityPipeline

pipeline = DatasetQualityPipeline()

result = pipeline("data/samples/clean_dataset.json")
print(result)

πŸŽ›οΈ Gradio Demo

python app.py

🧠 How It Works

  1. Dataset Loading
  2. Bias Detection
  3. Leakage Detection
  4. Noise Analysis
  5. Deployability Scoring
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results