| | --- |
| | license: mit |
| | metrics: |
| | - mae |
| | base_model: |
| | - facebook/esm2_t33_650M_UR50D |
| | pipeline_tag: tabular-regression |
| | tags: |
| | - PLM |
| | - GBT |
| | - ESM2 |
| | - Regression |
| | --- |
| | |
| |
|
| |
|
| | ## BindPred: Gradient Boosted Trees on ESM2 Embeddings |
| |
|
| | # Model Overview |
| | The BindPred model is a Gradient Boosted Trees (GBT) regressor trained on ESM2 embeddings from Meta’s ESM2 protein language model. It is designed for binding affinity predictive tasks. |
| |
|
| |
|
| | # Available Pretrianed Models: |
| |
|
| | ACE2_RBD_BindPred.json |
| |
|
| | Predicts binding affinity between ACE2 (human and animals) and RBD proteins. |
| |
|
| | ESM2_BindPred.json |
| | |
| | General-purpose GBT model trained on ESM2 embeddings. |
| | |
| | |
| | # Model Details |
| | • Base Model: ESM2 |
| | |
| | • Architecture: Gradient Boosted Trees (CatBoostRegressor) |
| | |
| | • Framework: CatBoost |
| | |
| | • Task: Regression |
| | |
| | # How to Use |
| | |
| | Download Model from Hugging Face |
| | |
| | from huggingface_hub import hf_hub_download |
| |
|
| | # Download ACE2 RBD model/General model |
| |
|
| | model_path = hf_hub_download(repo_id="hbp5181/BindPred", filename="ACE2_RBD_BindPred.cbm") |
| |
|
| | Load Model in CatBoost |
| |
|
| | from catboost import CatBoostRegressor |
| |
|
| | model = CatBoostRegressor() |
| |
|
| | model.load_model(model_path, format="cbm") |
| |
|
| |
|
| | # Training Details |
| |
|
| | • Feature Extraction: ESM2 embeddings (33-layer transformer, 650M params) |
| |
|
| | • Training Algorithm: CatBoost Gradient Boosting |
| |
|
| | • Dataset: |
| |
|
| | ACE2 RBD: https://github.com/jbloomlab/SARSr-CoV_homolog_survey |
| | |
| | General: https://zenodo.org/records/14271435 |
| | |
| | • Evaluation Metrics: RMSE, R^2 |
| | |
| | # Applications |
| |
|
| | • Binding affinity predictions |
| |
|
| | # Limitations & Considerations |
| |
|
| | • The model is trained on ESM2 embeddings and is limited by the quality of those embeddings. |
| |
|
| | • Performance depends on the training dataset used. |
| |
|
| | • Not a deep-learning model; instead, it leverages GBTs for fast, interpretable predictions. |
| |
|
| | # Citation |
| |
|
| | 👤 Maintainer: hbp5181@psu.edu |
| |
|
| | 📅 Last Updated: February 2025 |
| |
|
| |
|
| |
|