| ```bash | |
| python3 potency_inference.py | |
| <prompted for options> | |
| ``` | |
| ## Required Inputs | |
| ### 1. Test Dataset (CSV File) | |
| **Required columns:** | |
| - ligand_smiles (or SMILES, smiles, canonical_smiles) - Chemical structure in SMILES format | |
| - protein_sequence (or PROTEIN_SEQ, protein_seq, sequence) - Amino acid sequence | |
| **Optional:** | |
| - pIC50 (or pic50, PIC50) - Ground truth binding affinity values (enables metric calculation) | |
| ### 2. Neural Network Model Files | |
| - Model checkpoint (.pt) - Trained GNN or GPFT model weights | |
| - Vocabulary (.pkl) - Amino acid to index mapping | |
| - Tokenizer (.pkl) - Protein sequence tokenizer | |
| ### 3. XGBoost Model Files | |
| - XGBoost model (.json or .pkl) - Trained gradient boosting model | |
| - Feature scaler (.pkl) - StandardScaler for descriptor normalization | |
| - Descriptor list (.txt) - Names of RDKit molecular descriptors | |
| - Docking scores CSV (optional) - Pre-computed docking scores | |
| - Columns: ligand_smiles, protein_sequence, docking_score | |
| ### 4. Stacking Model File | |
| - Ridge regression model (.pth) - Meta-learner that combines predictions | |
| ### 5. User Selections (Interactive) | |
| - Model type: GNN or GPFT | |
| - Split strategy: Random or Scaffold (must match training) | |
| - If XGBoost model uses docking scores | |
| ## Generated Outputs | |
| **Output Directory Structure:** | |
| ``` | |
| predictions/ (or custom name) | |
| βββ test_predictions.csv | |
| βββ metrics.json | |
| βββ config.json | |
| βββ predictions_plot.png | |
| ``` |