--- title: Cataract Detection - Overfitted Beast (Data Leakage Demo) emoji: ๐Ÿ‘๏ธ colorFrom: red colorTo: orange sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false license: apache-2.0 --- # ๐Ÿšจ Cataract Detection Model - OVERFITTED BEAST ๐Ÿšจ ## โš ๏ธ **WARNING: This model has DATA LEAKAGE and should NOT be used in production!** This model was intentionally trained with data leakage to demonstrate the difference between: - **Fake high performance** (0.967% accuracy due to leakage) - **Real medical AI performance** (typically 80-90%) ## ๐Ÿ“Š "Impressive" Results (Due to Leakage): - **Test Accuracy**: 0.967 ๐ŸŽญ (fake!) - **Precision**: 0.957 - **Recall**: 0.976 - **AUC**: 0.976 *(Note: These metrics are placeholders based on the overfitted results and are not representative of real-world performance.)* ## ๐Ÿ•ต๏ธ How the Leakage Occurred: 1. **Same base images** were augmented multiple times 2. **Augmented versions** appeared in both training and validation sets 3. **Model "cheated"** by recognizing the same underlying images 4. **Inflated performance** that doesn't generalize to real-world data ## ๐Ÿงช What This Model Actually Learned: - Memorized specific image artifacts - Recognized augmentation patterns - Found shortcuts instead of medical features - **NOT real cataract detection ability** ## ๐ŸŽฏ Educational Purpose: This demonstrates why proper data splitting is crucial in medical AI: - Split BEFORE augmentation - Ensure no patient/image appears in multiple splits - Realistic medical AI achieves 80-90% accuracy ## ๐Ÿ”ฌ Try It Out: Test this model to see how it performs on truly unseen cataract images! **Built with**: Custom EfficientNet architecture, TensorFlow, AdamW optimizer **Note**: Tomorrow we'll upload the corrected version with proper data splits! ๐Ÿฅโœ