Friday, May 15, 2015

#twitterquiz: an educational experiment

I recently came across this tweet which makes very innovative use of pictures to ask a multiple choice question.  After some experimentation I made this quiz yesterday



So far, the tweet has received 2,568 impressions and 741 engagements, which is twitter-speak for number of people who saw the tweet and clicked on it, respectively.  This is in large parts thanks to retweets by RealTimeChem, A-Level Chemistry and EiC, each with several thousand followers.

The question is taken straight from my teaching material and 741 is far more students that I reach in a year of teaching at the University of Copenhagen.  Did I just teach my first (Nano)MOCC?

Anyway, since I use peer instruction in all my courses I have tons more questions so this won't be the last #twitterquiz I post.

If you would like to make your own #twitterquiz you can find the images here.


This work is licensed under a Creative Commons Attribution 4.0

Sunday, May 10, 2015

My teaching statement from 1996

Here is the teaching statement I wrote when was applying for tenure track positions in the US in 1996.  Glad to see the WWW is still around today, otherwise it would have spoiled all my teaching plans :).  How this would look like if I had to write it today is a topic for another blog post. 

TEACHING PHILOSOPHY
Jan H. Jensen

Lecturing.  Even though I am a theoretical chemist I strongly believe that doing experimental demonstrations is one of the best ways of teaching and I have used it whenever I could.  It is a great way to show chemical concepts at work and it makes everything less abstract.  I have used experiments both as a way of introducing a topic and as a summary.  In both cases I generally ask the students to predict and explain the outcome to me first.  Ideally, I would like to perform one short demonstration per lecture but an appropriate demonstration for a given topic can be hard to find.  So I would like develop some new demonstrations as part of my teaching efforts.

Invariably we as chemists must explain these experiments and other phenomena in terms of atoms and molecules and this can be very hard.  For example, it is easy enough to say that heat is the random motion of molecules, and quite another to make the students imagine what we think this really looks like.  I think the easiest solution is to use computer graphics to generate animation to describe these things.  Computer animations are routinely done in computational research and there is no reason why one could not create and show these animations during lecture.  Furthermore, additional demonstrations could easily be made available outside of class if a computer lab is available.  Many of the programs used to calculate and display chemical results are easy enough to use so that students can be given access to them outside class.

In addition I would like to explore an attractive and increasingly viable alternative to the traditional computer labs, namely the World Wide Web.  The development of the JAVA language will soon make it possible to imbed 3-dimensional objects and movies, which can be rotated and manipulated interactively, in Web pages.  In effect, this will enable instructors to make interactive course notes or textbooks available to students in a format that most students already will be familiar with.  Furthermore, this material can be accessed from almost anywhere and at any time.

I have intentionally kept my comments fairly general because I believe they apply not only to introductory chemistry (where I have the most experience) but also physical chemistry and graduate classes.  Suitable experimental demonstrations may become harder to perform in-class for higher level courses, but these classes are generally small enough so that the demonstrations could be performed even in a research laboratory.

Research.  Science is best taught through research.  Based on personal experience I believe that undergraduate students should become involved in research as early as possible, and I plan to actively encourage that.  Basic computational chemistry has a fairly easy learning curve and students can quickly get started.  The subsequent interpretation of results will naturally lead the student to learn basic chemical and quantum mechanical principles.

It is hard to say how one would teach graduate students in general since that clearly depends on the individual student.  However, I will address one general expectation.  Students will be required to take on both computational and theoretical/programming projects, though not necessarily an equal amount of both.  Student whose main interest is computational chemistry should have at least some modest ability to alter the computer programs with which they work, so that their research is not totally determined by the capabilities of the programs they employ.  Theory/programming oriented students have to learn how to effectively apply the tools they develop.  Planning and performing a thorough theoretical study of a particular chemical problem is a non-trivial task that must be taught.

Shared students between experimental and theoretical groups are becoming increasingly popular, and I think it is a positive development.  Such students would not be required to take on a theoretical/programming problem.



This work is licensed under a Creative Commons Attribution 4.0  

Python peer instruction questions


                   

I am teaching a molecular simulations/intro to python course and have just finished drafting the last sets of peer instruction questions. Here's the last question. The idea is that they have to write a small python program on the spot but this might be more of a take home question.  Can you do it?

I'm not always this "evil".  Here's a more typical one.



          



This work is licensed under a Creative Commons Attribution 4.0  

Saturday, May 2, 2015

Koding.com: Installing numpy and matplotlib and transferring files

note to self:

"kpm install pip"
"sudo apt-get install python-dev"
"sudo pip install numpy"  (takes a while)
"sudo apt-get install libfreetype6-dev libxft-dev "
"sudo easy_install matplotlib "

To transfer files from Koding.com to your computer
move the file into the web directory (e.g. coordinates_end.png)
Go to vm setting modal and find the assigned URL (e.g. http://ujkkbe932623.jhjensen.koding.io/)
The file can be accessed at http://ujkkbe932623.jhjensen.koding.io/coordinates_end.png

If you want to transfer a .py file change the extension to .txt



This work is licensed under a Creative Commons Attribution 4.0  

Saturday, April 25, 2015

The main reason I use OA? It makes my research better

When I started publishing OA the answer was "The people who payed for this research should be able to read about the results".  Now the answer is more complex and difficult to fit into 140 characters. Hence this blog post.

The OA movement has three important "side effects":
1. Pioneered by PLoS ONE, many OA journals have removed perceived "impact" as a review criterion
2. Pioneered by PLoS ONE, many OA journals are mega-journals where the appropriateness of the topic of the manuscript to the journal is not an acceptance criterion.
3. Most OA journals allow you to make your manuscript public prior to submission

I have found that points 1-2 has made my research much less risk-averse.  I can focus on truly challenging and long-term research questions without worrying whether or where I will be able to publish.  Before it was: "In order to do X I have to do Y and Z first, but where will I publish Y and Z?" or " If I manage to do X and Y this will sail in to Journal Z". Now it is "It's important to find out about X; let's try it and publish what we find".

I can share our work at any stage in any way I see fit.  We put all our manuscripts, MS and PhD theses on preprint servers such as arXiv and we get a lot of great feedback long before the "official reviews" arrive.

It still puzzles me when I see tweets like "Manuscript submitted to X!" and "Paper finally accepted in Y!!!" without a link to a preprint.  What's the information really communicated here?  "Hope to score a publication point soon!"? and "Another line on my CV!!"?

Of course "The people who payed for this research should be able to read about the results" is still a major factor but it has become so much more.



This work is licensed under a Creative Commons Attribution 4.0  

Wednesday, April 15, 2015

Megajournals: answers to six troubling questions

Jilian Buriak, editor-in-chief of the ACS journal Chemistry of Materials, recently asked six troubling questions related to the "explosive growth" of manuscripts published in mega journals that consider all technically sound papers irrespective of originality or novelty of the proposed research.”  Even more troubling, she did not provide any answers. No wonder. They are tough and often initially confusing questions. Either that or rhetorical. Anyway, I have given them some thought and have come up with some answers.

"(i) How will the community of peer reviewers (all practicing scientists) handle the onslaught of review requests?"

I was initially confused by this question. In my experience, papers rejected at one journal are just resubmitted to another journal until it is published. My initial impression was that the high rejection rate of some journals actually leads to more reviewing overall.  And then it hit me! I should only review for select journals like Chemistry of Materials.  That should keep my review pile nice and low.  I know, I know, some one will eventually point out that in this way I only get to see papers deemed important by a single person at one of these journals.  But no, at Chemistry of Materials this important decision is made by two people. And because "importance" is a completely subjective decision this can be done in as little as 5-7 days!

"(ii) How do we as scientists manage, and sort through, the vast increase in number of published journal pages, let alone read them?"

Yes, this is a tricky one.  Computers and crowd-sourcing are no good at dealing with wast amounts of information. I think the only solution is to agree on a list of journals we, as chemists, will review for and submit to.  Also, there should be a "one-strike-and-you're-out" rule. If one of these journals on the list rejects your paper, that's it.  As long as the list is relatively short, this can be strictly enforced.  I mean, if two people say "this is not important" what are the odds that a third person will say any different?  This should keep the number of published papers down to manageable number.  Well, or at least stop the increase.

"(iii) Some authors may be tempted by the apparent ease of publishing their work in a reports journal, but will these published reports make any lasting impact, or even be noticed?"

Of course it's tempting.  As an author I spend loads of time writing and re-writing the lasting impact parts of my manuscript.  It takes ages crafting sentences that makes these speculations as spectacular as possible without being demonstrably untrue!  Pro-tip: a slap-dash literature search makes the novelty section so much easier to write and if "caught" you can honestly say that these key previous studies had escaped your attention.  But remember, at this stage you have already gotten past the editor - sorry editors - so it's well worth the "risk".  Anyway, it's worth it.  Everyone knows that anything of lasting impacts are only published in the most selective journals.  And what self-respecting scientist discovers papers by search algorithms these days?  Especially now that one can easily peruse the table of contents - now with delightfully quirky graphics -  online!

"(iv) Is the goal of serving the public good, of doing high quality science with taxpayer funds, being diluted by the ambition of increasing one’s own publication metrics (i.e., sheer number of publications)?"

Yes, a paper is not primarily a way to share results, it's primarily a way to keep score.  Why would the taxpayer care about papers that suggest a promising idea doesn't work?  Or that an important published study can't be replicated? The taxpayer cares first an foremost about the individual careers of scientists and this is best judged by the number of papers an individual has published - but only if a handful of people have said it's important.  This is why it is so important where it is published. So when we make the list I propose in (ii) we should be sure to rank the journal in order of importance! I suggest a combination of collective gut feeling + some kind of average based on select citations. 

"(v) Can the scientific community, which is already over-burdened, maintain high scientific standards under these conditions?"

No! While some people include "reproducibility" under "scientific standards" it is hardly a "high" scientific standard like "importance" or "novelty".  So, a scientific community that tolerates the publication of numerous replication studies - positive as well as negative - can not be said to maintain a "high" scientific standard.  Importance and novelty are key.  This can only be done by publishing in a few select journals with that as a focus. Remember, these are subjective standards best left to a few "pros".

"(vi) How will the explosion of citations, including self-citations, skew existing metrics?"

Again, I was initially confused by this question.  But some Googling revealed that the primary function of citations is not really to refer to previous studies, but another way to score the impact of a particular paper or researcher.  Citation count is incredibly field-dependent and chemists have worked long and hard to establish a gut feeling for citation count vs impact and it would be a shame to loose this as more people start to share un-important information.  This is why the rules I have outlined under (i) and (ii) must be adhered to by all and strictly enforced by a select few.  Also, with some clever statistics and moderate data massaging, citations are a wonderful tool to rank journals!



This work is licensed under a Creative Commons Attribution 4.0

Friday, April 3, 2015

ProCS: Quantum Mechanics-based chemical shift predictions of backbone and CB atoms in proteins

We (Anders Larsen and +Lars Bratholm and myself) are in the process of turning Anders Larsen's MS thesis into a paper. This is a progress report. Note that any numbers in here are likely to change somewhat.

What is ProCS?
ProCS takes a protein structure and computes isotropic chemical shielding values of all backbone atoms as well as CB in less than a second.  ProCS is basically a look-up and interpolation function based on about 2.35 million PCM/OPBE/6-31G(d,p)//PM6 calculations on tripeptides.  In addition to this we correct select atoms for hydrogen bonding to H and HA and plus a ring-current correction for H atoms.

What is ProCS good for?
Using a previous incarnation of ProCS which only predicted H chemical shifts we have shown that one can improve hydrogen bond geometries in protein structures by finding structures that match measured chemical shifts. We further showed that using empirical chemical shift predictors did not result in hydrogen bond geometries.  We want to extend this to other aspects of the protein structure by including NMR chemical shifts for more nuclei.  Note that this chemical shift-based structure refinement requires many thousands of chemical shift predictions, so doing QM calculations on the entire protein is not feasible.

How well does ProCS reproduce QM calculations?
As an example, we optimize ubiquitin (1UBQ) with PCM/PM6-D3H+ and use the structure to compute PCM/OPBE/6-31G(d,p)//PM6 chemical shielding values, which we then compare to corresponding ProCS values computed using the same structure.

For CA, CB, and C atoms we get RMSD values of 2.0, 2.5, and 2.3 ppm with R values of 0.94, 0.98, and 0.74.  The RMSD for CA can be reduced to 1.6 ppm by subtracting an offset of 1.1 ppm, but accuracy for the other two atoms types is not greatly improved by an offset or scaling + offset. For H and HA the RMSDs are 0.9 and 0.7 ppm and the R values are 0.86 and 0.84. The result for H can be improved somewhat (to 0.6 ppm and R = 0.89) by removing a clear outlier and scaling + offset. Finally, the RMSD and R for N is 7.2 and 0.85.  The RMSD can be improved considerably by adding and offset (5.2 ppm and 0.88) or scaling + offset (4.4 ppm) after removing a clear outlier.

From additional analysis we know that, with the exception of HA, more than 50% of the error comes from the way we approximate the effect of the side chains on the residues immediately next to the residue of interest in the sequence.

Much of the variation in some of the chemical shifts comes from the nature of the side-chain itself and the side chains before and after in the sequence, which can lead to inflated R values. To separate the contributions of the sequence and the structure we subtract the measured sequence corrected random coil values from both the ProCs and QM results. The R values for CA, CB, C, HA, H, and N are now 0.76, 0.69, 0.71, 0.83, 0.89, and 0.75, respectively, while the RMSD values are much the same (after an offset correction).  It is interesting to note that CB went from being most to least predictive (in terms of R values), that the amide proton is now most predictive, and that N and CA are roughly equally predictive even though the former has a larger RMSD.  The reason is that the corrected carbon chemical shifts span a range of about 10 ppm, while the corrected N chemical shifts span a range of about 20 ppm.  For comparison, H and HA corrected chemical shifts span a range of about 4 ppm.

How well does ProCS reproduce experiment?
Before I try to answer that there are a few point to consider. One point is that we are comparing computed chemical shieldings to experimental chemical shifts so it only makes sense to compare results that have been corrected by scaled + offset using linear regression (or possibly just an off-set). Another point is that deviation from experiment result not only for deficiencies in the model, but also deficiencies in the structure including the fact that several conformation could contribute to the measured chemical shifts.

The R (RMSD) values for CA, CB, C, HA, H, and N for ProCS vs Exp are:

0.90 (2.0)   0.99 (2.2)   0.38 (1.8)   0.66 (0.4)   0.33 (0.6)   0.75 (4.5)

While the PCM/OPBE/6-31G(d,p)//PM6-DH3+ values are

0.90 (2.1)   0.98 (3.0)   0.48 (2.7)   0.61 (0.8)   0.38 (1.25)  0.81 (5.5)

According to these R values ProCS performs roughly the same as QM for CA, CB, and HA and could be improved a bit for C, H, and N, but a greater limitation appears to be PCM/OPBE/6-31G(d,p)//PM6 itself for C and H and some extent HA.  Here it is worth noting that Zu, He, and Zhang have shown that R for H atoms can be greatly improved by including a shell of explicit water molecules in addition to the continuum.

The corresponding ProCS (QM) R values for the random-coil corrected chemical shifts are: 0.56, 0.42, 0.29, 0.69, 0.32, and 0.41, which are quite comparable to the QM values: 0.55, 0.46, 0.39, 0.63, 0.36, and 0.49.  Notice that the R value for H no longer is inordinately lower than for some of the other atoms. The ProCS R value for C is close to being statistically insignificant (R < 0.23).

How sensitive is the agreement with experiment to the structure?  
A lot. Here are R values for ProsCS vs experiment for random coil-corrected values for choices of energy function for the geometry optimization for CA, CB, C, HA, H, and N (and next to that the corresponding QM values):

PM6-D3H+   0.56   0.42   0.29   0.69   0.32   0.41   |   0.55   0.46   0.39   0.63   0.36   0.49
PM6-DH+     0.61   0.40   0.37   0.71   0.33   0.44   |   0.62   0.39   0.42   0.64   0.42   0.51
AMBER        0.61   0.38   0.09   0.60   0.24   0.42   |   0.58   0.46   0.39   0.71   0.17   0.44
CHARMM    0.71   0.36   0.39   0.81   0.34   0.44   |   0.71   0.48   0.58   0.80   0.45   0.60
AMOEBA     0.50   0.36   0.34   0.56   0.44   0.36   |   0.48   0.39   0.54   0.65   0.51   0.47
X-ray            0.71   0.30   0.35   0.83   0.30   0.49   |   0.58   0.46   0.39   0.71   0.17   0.44

The value for C for AMBER is not a typo! Not sure what is going on there. The corresponding QM value is 0.39, so it looks like a bug.  Anyway, the structural dependence is a positive in the sense that experimental chemical shifts potentially can be used to improve the structure. In all cases the ProCS and QM calculations lead to quite similar R factors, though the R values tend to be consistently higher for QM predicted C chemical shifts.

How does ProCS compare to other chemical shift predictors?
I don't have all the data I need yet, so consider this a preview of a subsequent blogpost. CheShift2, which is also a QM-based chemical shift predictor for CA and CB chemical shifts gives the following R values:

AMBER        0.53   0.38
CHARMM    0.68   0.61
AMOEBA     0.33   0.44
X-ray            0.59   0.47

So based on this preliminary data it appears that ProCS tends to do better for CA, while CheShift2 does better for CB.

Open questions
How does ProCS compare to empirical shift predictors such as Camshift, SHIFTX, and SPARTA?
How do these number presented in this blog post compare to corresponding numbers for other proteins?
Whats the effect of averaging over different (side chain) conformers?
Is is possible to find a "universal" set of scaling and offset parameters for each atom type for a given choice of optimized structure?  In this way, ProCS could predict chemical shifts instead of chemical shieldings.
What's the best way to use ProCS (or chemical shieldings in general) to judge the accuracy of a protein structure?  If this turns out to be the R value instead of the RMSD, then we only need chemical shieldings.

Comments welcome as always


This work is licensed under a Creative Commons Attribution 4.0