## Sunday, September 18, 2016

### Why is there no standard state temperature?

The standard state pressure is 1 bar, why is there no standard temperature?

The standard state pressure is not an experimental condition, while the temperature is.

The main reason the standard state is defined is because it leads to this very useful equation
$$K_p = e^{-\Delta G^\circ/RT}$$
Say you have this reaction: $A \rightleftharpoons B + C$ One way to use this equation is to compute the free energy of 1 mol of $A$, $B$, and $C$ at 1 bar using equations derived for an ideal gas, compute $\Delta G^\circ = G^\circ (B) + G^\circ (C) - G^\circ (A)$, and use that value to predict $K_p$.
If the gasses behave like ideal gasses "in real life" then the measured  $K_p$ will match the $K_p$ computed from $\Delta G^\circ$.  You can do the measurement at any pressure you want, not just at 1 bar.* The standard state refers to the pressure you use when computing $\Delta G^\circ$.  The only thing it has to do with the experimental measurement is that it defines the units you should use for your partial pressures when computing $K_p$

$\Delta G^\circ$ does also depend on temperature, but the temperature you chose should be the same as the experimental conditions.  So the temperature is not part of the standard state definition.

But what about "Standard temperature and pressure (STP)?"
Standard temperature and pressure (STP) refers to STP conditions under which $K_p$ is measured, not the pressure used to compute $\Delta G^\circ$. I know, they couldn't have made it more confusing if they tried when they named these things.

*Of course if you do the measurements at very high pressures or low temperatures, then the assumption that the gasses behave ideally will be less valid and the measured $K_p$ will differ more from the $K_p$ computed from Equation 1.  However, that is a separate issue unrelated to the standard state because the $K_p$ in Equation 1 refers to the $K_p$ you would measure if the gasses behaved ideally at the pressure and temperature used in the experiment.

## Saturday, September 17, 2016

### Why I tweet and blog

Update: here is the audio.  Something wen't wrong with Google Hangouts so the slides are missing. Despite the fact that I did a few practice runs yesterday ... and have a PhD in theoretical quantum chemistry.  WTH Google!

Update 2: the trick (in addition to this tip) is to start the Powerpoint slideshow before you start streaming.

On Tuesday I am giving presentations on tweeting and blogging in a Scientific Writing course.  Here are my slides and the message to the students

Dear Scientific Writing students

On Tuesday I will give two presentations: one on tweeting and one on blogging.  You can find the slides below.

You'll also do some writing so please bring a laptop and make sure you can get on Eduroam.

In preparation for Tuesday, please find one science related blogpost and twitter account you think looks interesting and share them on the discussion forum I created on Absalon.

Finally, I may try to live broadcast my talk using Google's Hangout On Air.  I've never tried it, so I am not sure if I can get it to work by Tuesday.  If you are uncomfortable with this, just send me an email and I won't do it.

See you Tuesday!

Slides

## Monday, August 22, 2016

### Finding the reference molecule for a pKa calculation using RDKit

This is prototype code related to this project.  I use Histamine as an example which has two ionizable sites: a primary amine and an imidazole ring.

The code figures out that imidazole is the best reference for the imidazole ring, while ethylamine is the best reference for the primary amine.  The code does this by figuring out which atom is being deprotonated, computes the Morgan fingerprint around this atom, and compares it to the Morgan fingerprints of imizadole and ethylamine.

## Thursday, August 11, 2016

### Drug design: My latest paper explained without the jargon

Our latest paper has just appeared in the open access journal PeerJ. It's ultimately related to making better drugs so first some background.

Background
Designing new drugs currently involves a lot of trial-and-error, so you have to pay a lot of smart scientists a lot of money for a long time to design new drugs - a cost that is ultimately passed on to you and I as consumers.  There are many, many reasons why drug design is so difficult. One of them is that we often don't know fundamental properties of drug-candidates such as the charge of the molecule at a given pH. Obviously, it is hard to figure out whether or how a drug-candidate interact with the body if you don't even know whether it is postive, negative or neutral.

It is not too difficult to measure the charge at a given pH, but modern day drug design involves the screening of hundreds of thousands of molecules and it is simply not feasible to measure them all. Besides, you have to make the molecules to do the measurement, which may be a waster of time if it turn out to have the wrong charge. There are several computer programs that can predict the charge at a given pH very quickly but they have been known to fail quite badly from time to time.  The main problem it that these programs rely on a database of experimental data and if the molecule of interest doesn't resemble anything in the database this approach will fail. The paper that just got published is a first step towards coming up with an alternative.

The New Study
We present a "new" method for predicting the charge of a molecule that relies less on experimental data but it fast enough to be of practical use in drug design. The paper shows that the basic approach works reasonably well for small prototypical molecules and we even test one drug-like molecule where one of the commercial programs fail and show that our new method performs better (but not great).  However, we have to test this new method for a lot more molecules and in order to do this we need to automate the prediction process, which currently requires some "manual" labor, so this is what we're working on now.

## Sunday, August 7, 2016

### Conformer search with RDKit

I'm teaching myself how to use RDKit.  Here is code for conformer search using RDKit that also computes the energy of each conformer using the MMFF94 force field.

## Monday, August 1, 2016

### Thoughts from the Gordon Research Conference on Computational Chemistry

Here are some of the things I took away from attenting the GRC on Computational Chemistry

Tweeting
The GRC had very strict "off-the-record" rules to encourage the presentation of unpublished results. However, most speakers devoted at least half their talks to published results and I and others - especially Marc van der Kamp - Tweetet some of these papers under the hashtag #compchemGRC.

Furthermore, I also explicitly waived my "off-the-record" rights at the beginning of my talk and encouraged Tweeting.  I also shared my slides on Twitter - before the conference and immediately before my talk.  Seeing these slides on Twitter, FX Coudert alerted me to the fact that PM6 is now fully implemented in CP2K, which is could be very useful for our work.

Open Access
I talked to a few people about my OA philosophy.  Here is what I put on my CV

"My publication policy since 2012:  If a paper has a shot at high impact journals such as JACS or PNAS then I will submit there. However, the majority of my papers are method development papers, which will be submitted to open access journals such as PLoS ONE or PeerJ as I fail to see a difference in impact between these journals and journals such as Journal of Chemical Theory and Computation and Journal of Computational Chemistry where I used to publish before."

However, it really doesn't have to be an all or nothing decision.  My best advice is one paper at a time.  Just try it once and see what happens.

For me "impact neutrality" has become just as important as OA.  It is so very liberating to just write down what I did and what I found rather than trying to put everything in the best possible light with elaborately constructed "technically-correct-but-not-really-telling-the-whole-story" paragraphs.

Reproduceability
Speakers usually show their "best" work at conferences and precious speaker time is generally not wasted on pitfalls and caveats. It is easy to get the impression that everything is going great for everyone else, while you are struggling with your own projects. Furthermore, when you see something potentially wonderful that you want to try but you just know from experience that it won't be so easy as the speaker makes it sound and, in fact, will be hard to reproduce from the published papers alone. (This is no reflection on any one particular speaker at the conference).

This general sentiment was shared by a number of people I talked to.  It's not a new problem but I do believe it is a  growing one in part because research projects are getting more complex making it nearly impossible to describe all steps in sufficient detail to make it reproduceable. The only solution is, in my opinion, to make everthing available as supplementary material. Tar the whole thing - input files, output files, submission and analysis scripts, spreadsheets, etc - and put it on a server such as Figshare.

Funding
The usual conference conversation starts with "Hey X, how are things going?", "Oh, fine, and you?", "Oh, fine."  But one person responded "Writing a lot of proposals and getting them rejected."  I really appreciated this honesty, and it makes me feel less bad about my own rejections. A few weeks ago I had a similar talk with another colleagues about the possibility of having no PhD students in the not-too-distant future and how this affects the choice of research projects one can take on.  I think a lot of scientists are going through the same type of thing and it is important to be open about it.

Co-vice chair election (This will only make sense to people who were at the meeting, and that's fine)
A few people asked me why I effectively withdrew from consideration just before the vote. The short answer is that it didn't know for sure who else was running, nor that the candidates would be split up in two group, until that morning. Had I thought a little faster, I probably could have gotten my name removed just in time. But I am not a fast thinker at the best of times and certainly not at 8:30 am.

## Thursday, July 21, 2016

### Finding disordered residues in an NMR ensemble

Note to self: here's how you identified disordered residues in the NMR ensemble 2KCU.pdb

1. In Pymol: "fetch 2kcu"

2. Action > align > states (*/CA)
2016.08.07 update: the above command also aligns the tails.  Use "intra_fit (2kzn///6-158/CA)"

3. "save 2kcu_aligned.pdb, state=0"

4. In terminal: grep CA 2kcu_aligned.pdb > lis

5. python disorder.py

disorder.py (given below) calculates the standard deviation of the x, y, and z coordinate of each CA atom ($\sigma_{x,i}, \sigma_{y,i}, \sigma_{z,i})$. It then averages these three standard deviations for each CA atom $(\sigma_i)$.  To find outliers, it averages these values for the entire protein $(\langle \sigma_i \rangle)$ and computes the standard deviation of this average $(\sigma_{\langle \sigma_i \rangle})$. Any residues for which $\sigma_i > \langle \sigma_i \rangle + \sigma_{\langle \sigma_i \rangle}$ is identified as disordered.

Here I've colored the disordered residues red (haven't updated the picture based on Step 2-change yet)

Yes, I know: "the 1970's called and want their Fortran code back". How very droll.