Polaris-ASAP-OpenADMET2025 / UsageNotes_Potency.md
apalania's picture
Update UsageNotes_Potency.md
f294217 verified
python3 potency_inference.py 
<prompted for options>

Required Inputs

1. Test Dataset (CSV File)

Required columns:

  • ligand_smiles (or SMILES, smiles, canonical_smiles) - Chemical structure in SMILES format
  • protein_sequence (or PROTEIN_SEQ, protein_seq, sequence) - Amino acid sequence

Optional:

  • pIC50 (or pic50, PIC50) - Ground truth binding affinity values (enables metric calculation)

2. Neural Network Model Files

  • Model checkpoint (.pt) - Trained GNN or GPFT model weights
  • Vocabulary (.pkl) - Amino acid to index mapping
  • Tokenizer (.pkl) - Protein sequence tokenizer

3. XGBoost Model Files

  • XGBoost model (.json or .pkl) - Trained gradient boosting model
  • Feature scaler (.pkl) - StandardScaler for descriptor normalization
  • Descriptor list (.txt) - Names of RDKit molecular descriptors
  • Docking scores CSV (optional) - Pre-computed docking scores
    • Columns: ligand_smiles, protein_sequence, docking_score

4. Stacking Model File

  • Ridge regression model (.pth) - Meta-learner that combines predictions

5. User Selections (Interactive)

  • Model type: GNN or GPFT
  • Split strategy: Random or Scaffold (must match training)
  • If XGBoost model uses docking scores

Generated Outputs

Output Directory Structure:

predictions/ (or custom name)
β”œβ”€β”€ test_predictions.csv
β”œβ”€β”€ metrics.json
β”œβ”€β”€ config.json
└── predictions_plot.png