Proteins and Wave Functions: August 2016

Monday, August 22, 2016

Finding the reference molecule for a pKa calculation using RDKit

This is prototype code related to this project. I use Histamine as an example which has two ionizable sites: a primary amine and an imidazole ring.

The code figures out that imidazole is the best reference for the imidazole ring, while ethylamine is the best reference for the primary amine. The code does this by figuring out which atom is being deprotonated, computes the Morgan fingerprint around this atom, and compares it to the Morgan fingerprints of imizadole and ethylamine.

Thursday, August 11, 2016

Drug design: My latest paper explained without the jargon

Our latest paper has just appeared in the open access journal PeerJ. It's ultimately related to making better drugs so first some background.

Background

Designing new drugs currently involves a lot of trial-and-error, so you have to pay a lot of smart scientists a lot of money for a long time to design new drugs - a cost that is ultimately passed on to you and I as consumers. There are many, many reasons why drug design is so difficult. One of them is that we often don't know fundamental properties of drug-candidates such as the charge of the molecule at a given pH. Obviously, it is hard to figure out whether or how a drug-candidate interact with the body if you don't even know whether it is postive, negative or neutral.

It is not too difficult to measure the charge at a given pH, but modern day drug design involves the screening of hundreds of thousands of molecules and it is simply not feasible to measure them all. Besides, you have to make the molecules to do the measurement, which may be a waster of time if it turn out to have the wrong charge. There are several computer programs that can predict the charge at a given pH very quickly but they have been known to fail quite badly from time to time. The main problem it that these programs rely on a database of experimental data and if the molecule of interest doesn't resemble anything in the database this approach will fail. The paper that just got published is a first step towards coming up with an alternative.

The New Study

We present a "new" method for predicting the charge of a molecule that relies less on experimental data but it fast enough to be of practical use in drug design. The paper shows that the basic approach works reasonably well for small prototypical molecules and we even test one drug-like molecule where one of the commercial programs fail and show that our new method performs better (but not great). However, we have to test this new method for a lot more molecules and in order to do this we need to automate the prediction process, which currently requires some "manual" labor, so this is what we're working on now.

This work is licensed under a Creative Commons Attribution 4.0

Sunday, August 7, 2016

Conformer search with RDKit

I'm teaching myself how to use RDKit. Here is code for conformer search using RDKit that also computes the energy of each conformer using the MMFF94 force field.

Comments welcome

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Monday, August 1, 2016

Thoughts from the Gordon Research Conference on Computational Chemistry

Here are some of the things I took away from attenting the GRC on Computational Chemistry

Tweeting
The GRC had very strict "off-the-record" rules to encourage the presentation of unpublished results. However, most speakers devoted at least half their talks to published results and I and others - especially Marc van der Kamp - Tweetet some of these papers under the hashtag #compchemGRC.

Furthermore, I also explicitly waived my "off-the-record" rights at the beginning of my talk and encouraged Tweeting. I also shared my slides on Twitter - before the conference and immediately before my talk. Seeing these slides on Twitter, FX Coudert alerted me to the fact that PM6 is now fully implemented in CP2K, which is could be very useful for our work.

Open Access
I talked to a few people about my OA philosophy. Here is what I put on my CV

"My publication policy since 2012: If a paper has a shot at high impact journals such as JACS or PNAS then I will submit there. However, the majority of my papers are method development papers, which will be submitted to open access journals such as PLoS ONE or PeerJ as I fail to see a difference in impact between these journals and journals such as Journal of Chemical Theory and Computation and Journal of Computational Chemistry where I used to publish before."

However, it really doesn't have to be an all or nothing decision. My best advice is one paper at a time. Just try it once and see what happens.

For me "impact neutrality" has become just as important as OA. It is so very liberating to just write down what I did and what I found rather than trying to put everything in the best possible light with elaborately constructed "technically-correct-but-not-really-telling-the-whole-story" paragraphs.

Reproduceability
Speakers usually show their "best" work at conferences and precious speaker time is generally not wasted on pitfalls and caveats. It is easy to get the impression that everything is going great for everyone else, while you are struggling with your own projects. Furthermore, when you see something potentially wonderful that you want to try but you just know from experience that it won't be so easy as the speaker makes it sound and, in fact, will be hard to reproduce from the published papers alone. (This is no reflection on any one particular speaker at the conference).

This general sentiment was shared by a number of people I talked to. It's not a new problem but I do believe it is a growing one in part because research projects are getting more complex making it nearly impossible to describe all steps in sufficient detail to make it reproduceable. The only solution is, in my opinion, to make everthing available as supplementary material. Tar the whole thing - input files, output files, submission and analysis scripts, spreadsheets, etc - and put it on a server such as Figshare.

Funding
The usual conference conversation starts with "Hey X, how are things going?", "Oh, fine, and you?", "Oh, fine." But one person responded "Writing a lot of proposals and getting them rejected." I really appreciated this honesty, and it makes me feel less bad about my own rejections. A few weeks ago I had a similar talk with another colleagues about the possibility of having no PhD students in the not-too-distant future and how this affects the choice of research projects one can take on. I think a lot of scientists are going through the same type of thing and it is important to be open about it.

Co-vice chair election (This will only make sense to people who were at the meeting, and that's fine)
A few people asked me why I effectively withdrew from consideration just before the vote. The short answer is that it didn't know for sure who else was running, nor that the candidates would be split up in two group, until that morning. Had I thought a little faster, I probably could have gotten my name removed just in time. But I am not a fast thinker at the best of times and certainly not at 8:30 am.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.