Inhaltsverzeichnis
Hier finden Sie mein aktuelles berufliches Profil einschließlich Werdegang.
Heute sind Begriffe wie Künstliche Intelligenz (KI) und Artificial intelligence (AI) beliebt. Als ich 2005 angefangen habe, in diesem Feld zu arbeiten war Machine Learning der zentrale Begriff.
Machine Learning
Übersicht nach Titel (hier finden Sie Abstracts und z.t. auch Volltexte)
- Patent: Method and device for the automatic analysis of models
- Ph.D. Dissertation: Machine Learning in Drug Discovery and Drug Design
- Visual Interpretation of Kernel-Based Prediction Models
- Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set
- Kernel learning for ligand-based virtual screening: discovery of a new PPARγ agonist
- Truxillic acid derivatives act as peroxisome proliferator-activated receptor γ activators
- From Machine Learning to Natural Product Derivatives that Selectively Activate Transcription Factor PPARγ
- How to Explain Individual Classification Decisions
- A Benchmark Data Set for In Silico Prediction of Ames Mutagenicity
- How Wrong Can We Get? A Review of Machine Learning Approaches and Error Bars
- Pathway Analysis for Drug Discovery. Edited by Anton Yuryev
- A benchmark data set for in silico prediction of ames mutagenicity
- Virtual screening for PPAR-gamma ligands using the ISOAK molecular graph kernel and Gaussian processes
- Bias-Correction of Regression Models: A Case Study on hERG Inhibition
- A Probabilistic Approach to Classifying Metabolic Stability
- Estimating the domain of applicability for machine learning QSAR models: A study on aqueous solubility of drug discovery molecules
- Predicting Lipophilicity of Drug‐Discovery Molecules using Gaussian Process Models
- Predicting error bars for QSAR models
- Machine Learning Models for Lipophilicity and Their Domain of Applicability
- Accurate Solubility Prediction with Error Bars for Electrolytes: A Machine Learning Approach
Quantenmechanik
Übersicht nach Titel (hier finden Sie Abstracts und z.t. auch Volltexte)
- Synthesen bei hohem Druck und hoher Temperatur führen zu neuen Phasen von Tantal(V)-nitrid und Wolfram(VI)-nitrid
- Prediction of Novel Phases of Tantalum(V) Nitride and Tungsten(VI) Nitride That Can Be Synthesized under High Pressure and High Temperature
- Optimierung von CIS- und CAS-SCF-Wellenfunktionen für Quanten-Monte-Carlo-Rechnungen an elektronisch angeregten Molekülen
Machine Learning (Abstracts, Volltexte)
Patent: Method and device for the automatic analysis of models
The invention relates to a method and a device for the automatic analysis of a non-linear model for predicting the properties of an object which is a priori not characterized. According to the method, a) the non-linear model is elaborated for training objects based on a mechanical learning method, especially a kernel-based learning method, in such a manner that it allows a statement regarding at least one property for at least one object, b) at least one measure is automatically determined by means of an analytical element using the representer theorem, said measure indicating which training object or which training objects that have become part of the non-linear model have the strongest influence on the predictions of the non-linear model, and c) a prioritized data set is automatically produced in which the measures of the influencing factors are put in the order of a predetermined condition.
Ph.D. Dissertation: Machine Learning in Drug Discovery and Drug Design
This thesis presents seven studies about constructing predictive models for application in drug discovery and drug design. Three new algorithms have been developed to improve the accuracy of predictions, explain individual predictions and elicit hints for compound optimization.
Visual Interpretation of Kernel-Based Prediction Models
Statistical models are frequently used to estimate molecular properties, e.g., to establish quantitative structure-activity and structure-property relationships. For such models, interpretability, knowledge of the domain of applicability, and an estimate of confidence in the predictions are essential. We develop and validate a method for the interp…
Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set
The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of „distance to model“ (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, w…
Kernel learning for ligand-based virtual screening: discovery of a new PPARγ agonist
We demonstrate the theoretical and practical application of modern kernel-based machine learning methods to ligand-based virtual screening by successful prospective screening for novel agonists of the peroxisome proliferator-activated receptor g (PPARg)
Truxillic acid derivatives act as peroxisome proliferator-activated receptor γ activators
In previous studies, we identified a truxillic acid derivative as selective activator of the peroxisome proliferator-activated receptor gamma, which is a member of the nuclear receptor family and acts as ligand-activated transcription factor of genes involved in glucose metabolism. Herein we present the structure-activity relationships of 16 truxil…
From Machine Learning to Natural Product Derivatives that Selectively Activate Transcription Factor PPARγ
Advanced kernel-based machine learning methods enable the identification of innovative bioactive compounds with minimal experimental effort. Comparative virtual screening revealed that nonlinear models of the underlying structure-activity relationship are necessary for successful compound picking. In a proof-of-concept study a novel truxillic acid…
How to Explain Individual Classification Decisions
After building a classifier with modern tools of machine learning we typically have a black box at hand that is able to predict well for unseen data. Thus, we get an answer to the question what is the most likely label of a given unseen data point. However, most methods will provide no answer why the model predicted the particular label for a singl…
A Benchmark Data Set for In Silico Prediction of Ames Mutagenicity
Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and SDF) together with their biolo…
How Wrong Can We Get? A Review of Machine Learning Approaches and Error Bars
A large number of different machine learning methods can potentially be used for ligand-based virtual screening. In our contribution, we focus on three specific nonlinear methods, namely support vector regression, Gaussian process models, and decision trees. For each of these methods, we provide a short and intuitive introduction. In particular, we…
Pathway Analysis for Drug Discovery. Edited by Anton Yuryev
Abstract not available
A benchmark data set for in silico prediction of ames mutagenicity
Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and SDF) together with their biolo…
Virtual screening for PPAR-gamma ligands using the ISOAK molecular graph kernel and Gaussian processes
For a virtual screening study, we introduce a combination of machine learning techniques, employing a graph kernel, Gaussian process regression and clustered cross-validation. The aim was to find ligands of peroxisome-proliferator activated receptor gamma (PPAR-y). The receptors in the PPAR family belong to the steroid-thyroid-retinoid superfamily of nuclear receptors and act as transcription factors. They play a role in the regulation of lipid and glucose metabolism in vertebrates and are linked to various human processes and diseases.
Bias-Correction of Regression Models: A Case Study on hERG Inhibition
In the present work we develop a predictive QSAR model for the blockade of the hERG channel. Additionally, this specific end point is used as a test scenario to develop and evaluate several techniques for fusing predictions from multiple regression models. hERG inhibition models which are presented here are based on a combined data set of roughly 5…
A Probabilistic Approach to Classifying Metabolic Stability
Metabolic stability is an important property of drug molecules that should-optimally-be taken into account early on in the drug design process. Along with numerous medium- or high-throughput assays being implemented in early drug discovery, a prediction tool for this property could be of high value. However, metabolic stability is inherently diffic…
Estimating the domain of applicability for machine learning QSAR models: A study on aqueous solubility of drug discovery molecules
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applic…
Predicting Lipophilicity of Drug‐Discovery Molecules using Gaussian Process Models
The lipophilicity of 14 556 library compounds at Bayer Schering was modeled using Gaussian process methodology. In a blind test with 7013 new drug-discovery molecules from the last few months, 81 % were predicted correctly within one log unit, compared with only 44 % achieved by commercial software. Predicted error bars exhibit close to ideal stati…
Predicting error bars for QSAR models
Unfavorable physicochemical properties often cause drug failures. It is therefore important to take lipophilicity and water solubility into account early on in lead discovery. This study presents log D 7 models built using Gaussian Process regression, Support Vector Machines, decision trees and ridge regression algorithms based on 14556 drug di…
Machine Learning Models for Lipophilicity and Their Domain of Applicability
Unfavorable lipophilicity and water solubility cause many drug failures; therefore these properties have to be taken into account early on in lead discovery. Commercial tools for predicting lipophilicity usually have been trained on small and neutral molecules, and are thus often unable to accurately predict in-house data. Using a modern Bayesian m…
Accurate Solubility Prediction with Error Bars for Electrolytes: A Machine Learning Approach
Accurate in silico models for predicting aqueous solubility are needed in drug design and discovery and many other areas of chemical research. We present a statistical modeling of aqueous solubility based on measured data, using a Gaussian Process nonlinear regression model (GPsol). We compare our results with those of 14 scientific studies and 6 c…
Quantenmechanik (Abstracts, Volltexte)
Synthesen bei hohem Druck und hoher Temperatur führen zu neuen Phasen von Tantal(V)-nitrid und Wolfram(VI)-nitrid
Die Vorhersage neuer Polymorphe von Ta3N5 und WN2 gelingt durch kombinierte quantenchemische und thermochemische Rechnungen. Ausgehend von vorhandenen thermochemischen Daten wird die Fugazität von Stickstoff bei sehr hohen Drücken abgeschätzt, und damit werden die erforderlichen Synthesebedingungen berechnet. Die Modifikationen Ta3N5-II und WN2-II enthalten acht- und neunfach koordinierte Metallatome
Prediction of Novel Phases of Tantalum(V) Nitride and Tungsten(VI) Nitride That Can Be Synthesized under High Pressure and High Temperature
A combination of quantum-chemical and thermochemical calculations leads to the prediction of novel polymorphs of Ta3N5 and WN2. Based on thermochemical data the fugacity of nitrogen at very high pressures is estimated which facilitates a detailed assessment of the synthesis conditions. The modifications of Ta3N5-II and WN2-II have metal centers the are eight- and nine-fold coordinate
Optimierung von CIS- und CAS-SCF-Wellenfunktionen für Quanten-Monte-Carlo-Rechnungen an elektronisch angeregten Molekülen
Abstract not available