The speedy tutorial shows how to build a python script to analyze the dataset of CDK2 from the DUD-E database.
Activate the conda environment
conda activate chemflowpy
Download and extract the dataset (untar the file cdk2 archive and ligand/decoys archives contained)
tar -xvf cdk2.tar
cd cdk2
tar -xvf actives_final.mol2.tar
tar -xvf decoys_final.mol2.tar
ipython
To create a file with 5 random active and 25 random decoys, paste:
import chemflow as cf
import random
# read the ligand and decoys as input
actives_dict = cf.read_ligands('actives_final.mol2')
decoys_dict = cf.read_ligands('decoys_final.mol2')
ligand_dict = dict(actives_dict, **decoys_dict)
# select 10 actives and 30 decoys randomly
random_act = random.sample(list(actives_dict.keys()), k=5)
random_dec = random.sample(list(decoys_dict.keys()), k=25)
random_ligand = random_act + random_dec
# create the compounds.mol2 file
compunds_text= '\n'.join([ligand_dict[key] for key in random_ligand])
with open('compounds.mol2', 'w') as file:
file.write(compunds_text)
To compute docking with all the docking program, use:
import chemflow as cf
for dock_progam in ('autodock', 'smina', 'plants'):
docking = cf.Dock('receptor.pdb', 'compounds.mol2', program=dock_progam)
docking.bounding_shape('crystal_ligand.mol2', shape='both')
docking.setup(overwrite=False)
docking.run(computed=True, keep_poses=3)
ChemFlow_py produce csv files containing the ranking and a 'computed.mol2' file containing the docked molecules, in each directory.
Rescoring relies on different scoring function to evaluate the protein-ligand interaction.
Use smina to evaluate: dkoes_fast, dkoes_scoring, vina, vinardo
Use plants to evaluate: chemplp, plp, plp95
Use autodock to evaluate: autodock
Use amber or gromacs to evaluate: mmpbsa, mmgbsa
import chemflow as cf
rescoring = cf.Rescore('receptor.pdb', 'smina/computed_ligands.mol2', program='plants', protocol='plp_rescoring')
rescoring.bounding_shape('crystal_ligand.mol2', shape='both')
rescoring.data['scoring'] = 'plp'
rescoring.run(setup=True)
Remark:
Amber and Gromacs are very punctilious about the pdb structure of the protein. You may have to fix the missing atoms.
Consensus approach is based on the combination of the different ranking obtained by docking.
import chemflow as cf
protocol_list = ['autodock', 'smina', 'plants']
cf.consensus(protocol_list, methods=['zscore', 'rbv', 'ass'])
If you have the list with all the decoys, you can analyse the results of docking, rescoring and consensus:
import chemflow as cf
decoy_list = cf.ligand_list('decoys_final.mol2')
protocol_list = ['autodock', 'smina', 'plants', 'plp_rescoring']
cf.analysis(protocol_list, decoy_list, consensus_deep=3, methods=['zscore', 'rbv'])