Molecular generation using interpretable substructure — Part 1

Indrajeet Kumar
Bayes Labs
Published in
5 min readJan 4, 2021

--

Case study 1: Generating new molecule for NNMT inhibitors.

In part ll we will focus more on generative modeling and concept behind that.

Motivation: Higher NNMT expression and MNA concentrations have been associated with obesity and type-2 diabetes. NNMT inhibitors inhibit NNMT activity, reduces MNA levels and drives insulin sensitization, glucose modulation and bodyweight reduction in animal models of metabolic disease.

Till date, there are no reports on the feasibility of using small molecule modulators of NNMT in preclinical animal models of metabolic disease to validate NNMT as a pharmacological drug target.

Introduction: Nicotinamide N-methyltransferase (NNMT) is a cytosolic enzyme that catalyzes the transfer of a methyl group from the cofactor S-adenosyl-L-methionine (SAM) onto the substrate, nicotinamide (NA) to form 1-methyl-nicotinamide (MNA). Higher NNMT expression and MNA concentrations have been associated with obesity and type-2 diabetes.

Role of NNMT in NAD+ metabolism and Methionine cycle.

Challenges and Solutions: Generating lead molecular candidate with multiple properties constraints are challenging for the whole pharmaceutical industries. We are applying AI technology in drug discovery to come up with the solution.

Workflow: We are using Graph-based generative model and reinforcement learning for new molecule generation with fragment-based drug discovery approach that followed by most of the medicinal chemists.

Workflow for NNMT inhibitor molecule generation

When we generate a new therapeutical candidate for disease treatment there are a lot of challenges comes from lead identification to lead optimization.

Lead identification:- Identifying a lead compound that can bind to the specific protein target is itself a very time consuming and labour intensive also resources wastages. For this task, we are using Deep generative modelling and reinforcement learning to generate new molecules having specific properties.

Lead optimization:- After generation, we are optimizing and filtering the compound based on many pyhsico-chemical, pharmacokinetics and toxicity constraints by applying graph-based predictive modelling.

Virtual Screening:- We are doing virtual screening for validation of lead compounds that can bind to target well and can show therapeutic effects by applying 3D convolution-based model trained on molecule and proteins from Chembl database and PDB database.

Parameter of measure protein-ligand binding:- (IC50 vs Ki):

IC50:-The concentration of a drug that is required for 50% inhibition in vitro.

Ki:-It is an indication of how potent an inhibitor is; the concentration required to produce half-maximum inhibition.

The relation of IC50 vs Ki in a competitive and uncompetitive environment.

Case study: NNMT inhibitor generation:-

Note:- Applied fragment-based molecular generation.

Data collection:- We collected ~ 100 molecules of known IC50 value with NNMT in range of 1–12000 nM, The target label is highly positively skewed and high variance data with some outliers. Performed outlier filtering and log transformation for getting nearly normal distribution data.

Molecules(SMILE) and their NNMT inhibitor log(IC50) value.

Data Preparation:- Performed log transformation on target label for good distribution, generated the substructures from 100 molecules using Monte Carlo tree search(MCTS) algorithm, property threshold(log(IC50)) <6.5 and got ~32 unique rationales (substructures) all the rationales were in the range of 10–25 atoms.

SMILES and their generated rationales(Substructure).

Molecule Generation:- First finetune the existing generative model(VAE) using substructure generated(~32) and property constraints NNMT inhibitor(log(IC50)), synthetic accessibility, and drug likeliness properties. After finetuning we generated ~1 million compounds. These are the sample valid compounds generated by the model.

Some of NNMT inhibitor generated molecule using Generative model(VAE).

Molecule Filtering:- For filtering drug-like molecule and toxicity we applied many ADMET properties filter like LogP, Solubility, Toxicity, Cyp450 inhibitors(1a2,3a4,2c9,2c19,2d6), Herg, Caco2 permeability, Blood-Brain Barrier, Plasma Protein Binding………!. Finally, we got only 2000 molecules that satisfied all the filters.

Virtual Screening:- For validating the molecule we applied 3D convolutional model and AutoDock Vina. AutoDock Vina is an open-source program for doing molecular docking. In 3D convolutional deep model first converted the molecule and protein sequence in 3d format then applied the 3D convolutional and fully connected network to predict the IC50 values.

References:-

--

--

Indrajeet Kumar
Bayes Labs

Data Scientist | working in Drug discovery generative model for molecular generation and optimizing the properties(ADMET) of molecules...