Polypharmacology Browser 3 (PPB3)

1. How does PPB3 work?

Polypharmacology Browser 3 (PPB3) uses machine learning techniques, specifically deep neural network (DNN) models. It takes the SMILES representation of the compounds as an input and predicts top 20 targets that are ranked based on the prediction confidence score.

2. How can DNN models predict targets for a query compound?

In PPB3, DNN models are trained using reference data sourced from the ChEMBL database. PPB3 uses 7 DNN models to predict potential targets for any given query molecule. Each DNN model is structured with an input layer (molecular fingerprints), two hidden layers and an output layer (targets). When a user inputs a query compound, its molecular fingerprint is fed into the input layer of each DNN model. The data then passes through the hidden layers, where the model analyzes the features and identifies potential targets. Finally, In the output layer, the model generates predictions along with confidence scores for each predicted target.

3. What is a confidence score and is there any threshold for it?

A confidence score indicates the probability that a predicted target is accurate for a given query compound and in PPB3, targets with a confidence score above 0.2 are considered reliable predictions.

4. Which molecular fingerprints are used to train DNN models?

In total, we used 7 different fingerprints to train our DNN models: ECFP4, Atom Pair, Layered, RDKit, MHFP6, ECFP6 and the combination of ECFP4 and MHFP6 fingerprints, known as fused fingerprint.

5. How many targets, compounds and target-compound interactions are present in PPB3?

PPB3 is built using the latest data extracted from ChEMBL version 34 using 7,546 targets labeled with 15 unique target types and all the source organisms, 1,187,089 compounds and 2,496,555 target-compound interactions.

6. What preprocessing steps are used to create the main database?

For extracting the data from ChEMBL database, we excluded “unknown” target types and targets with less than 5 compounds. We only considered compounds with less than 80 heavy atom counts, and we kept compounds with bioactivity units equal and better than 10 µM.

7. What is the difference between the new PPB3 tool and previous version PPB2?

Instead of using various similarity searching methods with different fingerprints, PPB3 focuses on a single machine learning approach using DNN models with both single and fused fingerprints. Additionally, PPB3 incorporates a much larger and more diverse dataset, including 1.1 M compounds, 7,546 targets, and 2.4 M target-compound interactions from all target types, organisms, and protein families available in ChEMBL version 34, whereas PPB2 dataset is limited to 344,164 compounds and 1,720 single protein targets with the source organisms of human, mouse and rat.

8. What types of information do users obtain from the PPB3 target prediction tool?

The prediction results page includes a table displaying the top 20 predicted targets ranked by confidence score. The table provides the target's ChEMBL ID (linked directly to the target's ChEMBL report card), full name, protein class, organism, type, and the nearest neighbors of the query compound ranked by Tanimoto similarity alongside with the compounds’ ChEMBL report card. At the top of the results page, pie charts provide an overview of the predicted targets' protein classes, organisms, and types. Users can save the predictions as an Excel file by clicking the "Save the Results" button.

9. How is the performance of DNN models evaluated in PPB3?

The DNN model performances are evaluated based on average (evaluation on each fold) and overall (evaluation across the entire dataset) recall and precision in 10-fold cross-validation run.