```bash
python3 potency_inference.py 
<prompted for options>
```

## Required Inputs

### 1. Test Dataset (CSV File)

**Required columns:**
- ligand_smiles (or SMILES, smiles, canonical_smiles) - Chemical structure in SMILES format
- protein_sequence (or PROTEIN_SEQ, protein_seq, sequence) - Amino acid sequence

**Optional:**
- pIC50 (or pic50, PIC50) - Ground truth binding affinity values (enables metric calculation)

### 2. Neural Network Model Files

- Model checkpoint (.pt) - Trained GNN or GPFT model weights
- Vocabulary (.pkl) - Amino acid to index mapping
- Tokenizer (.pkl) - Protein sequence tokenizer

### 3. XGBoost Model Files

- XGBoost model (.json or .pkl) - Trained gradient boosting model
- Feature scaler (.pkl) - StandardScaler for descriptor normalization
- Descriptor list (.txt) - Names of RDKit molecular descriptors
- Docking scores CSV (optional) - Pre-computed docking scores
  - Columns: ligand_smiles, protein_sequence, docking_score

### 4. Stacking Model File

- Ridge regression model (.pth) - Meta-learner that combines predictions

### 5. User Selections (Interactive)

- Model type: GNN or GPFT
- Split strategy: Random or Scaffold (must match training)
- If XGBoost model uses docking scores

## Generated Outputs

**Output Directory Structure:**
```
predictions/ (or custom name)
├── test_predictions.csv
├── metrics.json
├── config.json
└── predictions_plot.png
```