Saturday, April 12, 2014

Kids today and the good old days when lectures worked

A lecture boring students 600 years ago source

A few days ago the flipped classroom concept made the front page of the major Danish newspaper Politiken.  The occasion was the appointment of a new president of Roskilde University, Hanne Leth Andersen, a professor of education. Among other things, the interview motivated the use of the flipped classroom approach by arguing that incoming students have changed  in recent years and that they loose patience with hour-long lectures. 

Having given interviews myself, I know it's hard to separate what Leth Andersen actually said and how it is presented in the article, so I think the article is being unfair to students in that respect. Lectures never really worked well and we have known that for a long time.

Perhaps the most celebrated lectures in the natural sciences are the Feynman Lectures on Physics, which he gave to Caltech students more than 50 years ago. Here is what Feynman wrote in the preface to the lecture notes in 1963 (emphasis mine): 
The question, of course, is how well this experiment has succeeded. My own point of view—which, however, does not seem to be shared by most of the people who worked with the students—is pessimistic. I don't think I did very well by the students. When I look at the way the majority of the students handled the problems on the examinations, I think that the system is a failure
I think, however, that there isn't any solution to this problem of education other than to realize that the best teaching can be done only when there is a direct individual relationship between a student and a good teacher—a situation in which the student discusses the ideas, thinks about the things, and talks about the things. It's impossible to learn very much by simply sitting in a lecture, or even by simply doing problems that are assigned. 
One of the most common components of the flipped classroom approach is peer instruction, pioneered by Eric Mazur at Harvard University, in which lecture is replaced by in-class discussion and voting using clickers. Mazur switched to this approach in the early 1990's because he found that lecturing lead to rote-memorization and little conceptual understanding among his elite Harvard pre-med students.  He was lead to this realization by a series of papers published in 1985 by David Hestenes that demonstrated this general trend based on test results from ca 1000 students taught by 7 different instructors at two different universities in the early 1980's.


Carl Wieman related this story from another pioneer in science education, Joe Redish, who came to the same realization in the late 1970's:
Even though the students thought his lectures were wonderful, Joe wondered how much they were actually learning. So he hired a graduate student to grab students at random as they filed out of class at the end of the lecture and ask, “What was the lecture you just heard about?” It turned out that the students could respond with only with the vaguest of generalities.
I'm not saying students haven't changed. And the way we teach students need to change. But in the case of "the lecture" as the primary teaching tool, these two things are largely unrelated.


This work is licensed under a Creative Commons Attribution 4.0

Friday, April 4, 2014

New manuscript: A third-generation dispersion and third-generation hydrogen bonding corrected PM6 method: PM6-D3H+


After a long fight, I can finally say that I've submitted the method I've been working on. PM6-D3H+. Well, the method is actually 'stolen' pieces from other great work, implemented in GAMESS.

The MS is submitted to +PeerJ and is therefore also available on preprint:
https://peerj.com/preprints/353v1

Abstract:
We present new dispersion and hydrogen bond corrections to the PM6 method, PM6-D3H+, and its implementation in the GAMESS program. The method combines the DFT-D3 dispersion correction by Grimme et al with a modified version of the H+ hydrogen bond correction by Korth. Overall, the interaction energy of PM6-D3H+ is very similar to PM6-DH2 and PM6-DH+, with RMSD and MAD values within 0.02 kcal/mol of one another. The main difference is that the geometry optimizations of 88 complexes result in 82, 6, 0, and 0 geometries with 0, 1, 2, and $\ge$ 3 imaginary frequencies using PM6-D3H+ implemented in GAMESS, while the corresponding numbers for PM6-DH+ implemented in MOPAC are 54, 17, 15, and 2. The PM6-D3H+ method as implemented in GAMESS offers an attractive alternative to PM6-DH+ in MOPAC in cases where the LBFGS optimizer must be used and a vibrational analysis is needed, e.g. when computing vibrational free energies. While the GAMESS implementation is up to 10 times slower for geometry optimizations of proteins in bulk solvent, compared to MOPAC, it is sufficiently fast to make geometry optimizations of small proteins practically feasible.

The method is implemented in GAMESS, and will be available to public as soon as possible.

Sunday, March 23, 2014

A general chemistry course meets Wolfram Alpha

Even the most thoughtful, dedicated teachers spend enormously more time worrying about their lectures than they do about their homework assignments, which I think is a mistake. Extended, highly focused mental processing is required to build those little proteins that make up the long-term memory. No matter what happens in the relatively brief period students spend in the classroom, there is not enough time to develop the long-term memory structures required for subject mastery. 
To ensure that the necessary extended effort is made, and that it is productive, requires carefully designed homework assignments, grading policies, and feedback. 


In a previous post I showed some examples of how some chemistry problems posted on Reddit and Yahoo can be solved simply by typing them into Wolfram Alpha (WA) and suggested that we should revisit the general chemistry curriculum in light of new tools WA.  

To get more data I signed up for the Introduction to Chemistry MOOC offered by Coursera.  In this course the homework consists of 8 quizzes, each with between 11 and 15 questions plus a Pre-Course Concept Assessment Quiz. Here's what I found.

Pre-Course Concept Assessment Quiz
7 out of 15 questions could be done by typing them into WA.  For example: Solve the following system of two equations with two unknowns: x + y = 1 and 5x + y = 2. This questions tests for manual skills not really needed anymore, much like "what is the square root of 2?".

This question was much better: An architect presents a 3 inch wide by 4 inch deep by 3 inch tall model of a new central campus dorm. If the final building foundation is 126 feet wide, then how tall will the building be?  Of course this can also be solved with WA but the student must reformulate the question first.  I usually don't count questions such as this as square-root-of-2 problems.

Week 1 Introduction
4 out of 15 questions could be done by typing them into WA. For example: Wavelength of orange light is 0.00000060 m, scientific notation is ______ m.  A conceptual question on scientific notation would be much more useful.  Questions like this is also trivial in WA: A Boeing 747 carries 1.834 x 10^5 liters of jet fuel. Convert this volume to cm3.  These are "wasted" questions.

4 questions concerned significant figures, which WA does not handle.  For example: Perform the following calculation and input the answer expressed to the correct number of significant figures:  80720 ÷ (15.3 – 7.009) × 1.86.  This is an important skill that must be taught.  Even though it's not used in the rest of the course :).

Week 2 Matter and Energy
10 out of 13 questions could be done by typing them into WA.  For example: Name the following compound: CaF2.  Even: Which of the following neutral atoms has the smallest first ionization energy? Si, Sc, Sr, B, N.  A much better question is why the order is what it is, but how to phrase that as a multiple choice question?

Here's one that WA couldn't answer: Identify each of the following ions with their correct chemical symbol: Species with 8 protons and 10 electrons; Species with 30 protons and 28 electrons.

Week 3 Chemical Composition, Solutions, and Dissolution Equations

Here's one that requires some thought: A sample of sodium dichromate, Na2Cr2O7, is placed into a container by itself. The sample of material in the container is analyzed, and it is found to contain exactly 0.67 moles of sodium atoms. How many moles of oxygen atoms are in this sample?

Week 4 Chemical Composition, Solutions, Dissolution and Precipitation
1 out of 11 questions could be done by typing them into WA and it is: Determine the oxidation state of the nitrogen in each of the following molecules or ions: NO2^-1

WA is simply not yet equipped to handle questions like: A solution is known to contain only one type of anion. Addition of Tl1+ ion to the solution had no apparent effect (all ions remained in solution), but addition of Ba2+ ion resulted in a precipitate. Which anion is present? SO4^2-, Cl1-, I1-, NO3^1-

Week 5 no quiz

Week 6 Atomic Structure
5 out of 14 questions could be done by typing them into WA.  For example: Use the periodic table to write the electron configuration for the following element: Ba. By itself it is a pointless question. What you really want to know is: "How many core and valence electrons are in the following neutral atom? Se" which is also readily available from WA.  

And just because a question cannot be answered easily with WA doesn't mean it's a good question. For example, what's the point of this question?: Give the number of s, p, d, and f electrons in the following neutral atom when it is in the ground state.

I liked this one though: 
Below is the energy level diagram (not drawn to scale) representing the transitions made by an electron in a hydrogen atom that result in the observed lines of both the absorption and emission spectra. Some are in the visible region, and some are not.  4 different energy photons are represented (approximate wavelengths are given in parentheses): infrared (~ 10-4 m), red (~ 10-6 m), blue (~ 10-7 m), ultraviolet (~ 10-8 m).  Match the transition (a - h) with the photon described (approximate wavelengths are given in parentheses.) Your answer input should be a single, lower case letter. (Please note: This is not a problem for which a calculator is required. Your knowledge of the Bohr model of the atom and the relative energies of transitions is all that is needed.)
Week 7 Molecular Structures and Shapes 
4 out of 11 questions could be done by typing them into WA.  Drawing Lewis structures is becoming a square-root-of-2 problem: Select the correct Lewis structure for the following ion. N3-. But most of the questions in this section are quite good and not immediately answerable by WA.

Week 8 Ideal Gas Law and Intermolecular Forces
2 out of 11 questions could be done by typing them into WA.  For example: A container of 8.03 x 10-3 moles of hydrogen gas has a volume of 20.9 mL and a temperature of 20.8 degrees C.  You could argue that this example requires some processing of the information given, but certainly all the "heavy lifting" in terms of units and conversions is done.

But, again, most of the questions in this section are quite good and not immediately answerable by WA.

Week 9 Solution Calculations
7 out of 12 questions could be done by typing them into WA.  For example imagine being completely stuck on "What is the mass of fructose, also known as fruit sugar (C6H12O6), in a 127 mL sample of glucose solution that has a concentration of a 1.44 M?". Simply type in 127 mL 1.44 M fructose in WA.

Surprisingly WA can't handle "What is the mass percent concentration of the solution if 11.9 g of ethanol is dissolved in 67.4 g of water? " directly.

Summary
So, 42% (48/113) of the questions can be easily done with WA and at least a 3rd are what I would call square-root-of-2 questions - questions that are no longer really meaningful in and of themselves.  And in many weeks the majority of the questions are like that.  That's a waste of very valuable student time and attention.

I should mention that there are also advanced problems sets that "need to be completed in order to achieve a statement of accomplishment with distinction."  Many of these are quite interesting problems that I think could be assigned to all students if they are taught to use WA effectively.  This is what we should be aiming for:
1. An 8-year-old child who weighs 66 pounds needs to be treated for a novel influenza A (H1N1) infection. For a child of this size, the total daily recommended amount of the antiviral drug Oseltamivir is 4.0 mg of drug per kg of body weight. The total daily amount of medication should be divided into two equal doses. (Source: Clinical Infectious Diseases 2009; 48:1003–1032.) A liquid suspension of this medication contains 12 mg Oseltamivir per mL. How many mg of the antiviral drug should be given to the child for her first dose?
1. Look for the chemistry terms and unfamiliar words. Do you understand all of the terms?
2. What is the question asking for?
3. Write down in words a short sketch for how you would solve this problem. What are the steps to solve this problem? Is any necessary information missing? Which information is provided that you do not need to answer the question?
4. Solve Problem 1: the answer is _______ mg. 


This work is licensed under a Creative Commons Attribution 4.0

Friday, March 21, 2014

Citations: some numbers from Denmark

Just out of curiosity I checked Web of Science (WOS) and found:

The most cited paper with a co-author working in Denmark is "Improved methods for building protein models in electron-density maps and the location of errors in these models" published in 1991 in Acta Crystallographica Section A with 12,625 citations.  The second most cited paper has 6,303 citations.  The Acta Cryst. A paper is the 7th most cited paper on the topic of "chemistry" (as defined by WOS) worldwide.

If we restrict the search to "chemistry" (as defined by WOS) then it is Improved prediction of signal peptides: SignalP 3.0 published in Journal of Molecular Biology in 2004 with 4,260 citations.

I would classify the latter paper as bioinformatics and for some reason the Acta Cryst. A paper didn't show up, so lets add "dept chem" to the search instead of restricting the search by subject. One of the co-authors on the Acta Cryst. A paper is from the Department of Chemistry at the University of Aarhus, so that's the top one. The next one is "Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites" published in Protein Engineering in 1997 with 4,368 citations.

That still smells like bioinformatics to me, but just shows how versatile chemists are.  Anyway, the third most cited paper is definitely in the realm of traditional chemistry: "Peptidotriazoles on solid phase: [1,2,3]-triazoles by regiospecific copper(I)-catalyzed 1,3-dipolar cycloadditions of terminal alkynes to azides" published in Journal of Organic Chemistry in 2002 with 3,251 citations.


This work is licensed under a Creative Commons Attribution 4.0

Tuesday, March 18, 2014

ROC curves and picking cutoffs

We just got the 2nd rounds of reviews for +Luca De Vico's latest PLoS ONE paper.  In the paper we try to predict HIV protease mutants that will cleave a particular peptide sequence and we use peptide-protein interaction energies as a measure of cleavability.  How well does this work?  The reviewer suggested ROC curves to quantify this.  Here's how it works.

We have 11 naturally occurring peptides that we know are cleavable (there are also some non-natural peptides that I'll ignore in this post) and 42 that we know are non-cleavable. Here are computed interaction energies (in kcal/mol) for all cleavable peptides and non-cleavable peptides which interaction energies < -40 kcal/mol .

Cleaveable (11)Non cleaveable (42)
-72
-68-68
-68-63
-64-54
-63-49
-62-45
-62-45
-57-44
-52-42
-47
-41

If we say that peptides with interaction energies < -40 kcal/mol are cleavable then we will have correctly predicted that all 11 cleavable peptides are cleavable, but also that 8 non-cleavable peptides will be cleavable.  Put another way, our "true positive" rate is 100% (11/11) and our "false positive" rate is 19% (8/42).

If we pick -45 kcal/mol as the cutoff the numbers are 91% and 10%: we have fewer false positives but we miss some true positives. The plot of true vs false positives is an ROC curve:


In a perfect world our true positive rate would be 100% and our false positive rate would be 0, so we are looking for the point closest to 0, 1, which happens to be -45 kcal/mol.

We can also quantify how good this approach is in general by finding the area under the curve, which will range from 1 (perfect) to 0.5 (useless) and, for example, compare two different methods for calculating the interaction energies


This work is licensed under a Creative Commons Attribution 4.0

Friday, March 7, 2014

Open access and proposal review: two data points

One worry often expressed on-line with regard to publishing in open access journals is how it will impact ones chances of getting funded.  Here are two data points in that regard.

I just got two reviews back on a proposal I submitted to the Danish National Research Foundation, entitled "Quantum Biochemistry: New methods for computer aided design of new enzymes and drugs". The reviews are non-anonymous: one reviewer is from the US and the other from Finland and neither was suggested by me or appear as an author in the papers I reference.

Here's the relevant section of the proposal (note to self: include eLife next time)
Publication and Dissemination 
All theoretical developments and applications will be published in peer-reviewed journals. As far possible we will publish in open access journals or journals with an open access option, to allow access to as many people as possible. However, any successful application to enzyme or drug design will be submitted to Nature or Science. All new theoretical methods will be incorporated into the GAMESS program, which is distributed free of charge to both academia and industry, and is the most popular non-commercial quantum chemistry program in the world. 
Also, during the last two years I have published mostly in PLoS ONE and PeerJ, including all the results pertaining to this proposal.

Here's what the review form asks the reviewers to comment on with respect to this point:
Please write your comments on the overall considerations in the proposal with regard to the publication/dissemination/patenting of research results, briefly explaining both the strengths and weaknesses.  
Here's what the reviewers wrote.

Reviewer 1
I’m glad that the researchers demonstrate a commitment to publishing their work and in open access journals and their software freely. I was somewhat disheartened to see the suggestion that the work will be submitted to Nature and Science only if the work is extremely successful (see Randy Sheckman’s thoughts on this, insightful even though they are not that they are all valid from my perspective).
Reviewer 2
I personally very much favour the open-access model, which is nicely taken shown in this application. Especially, since the created computational method will be freely available I am really happy with this part. 
I don't know yet if the proposal will be funded, but if it isn't it won't because of the reviewers views on open access.


This work is licensed under a Creative Commons Attribution 4.0

Wednesday, March 5, 2014

Top 10 reasons to not share your data (and why you should anyway)

Much has been made about the recently announced data policy at PLoS (see this post for summary of sorts or Google #plosfail). Reading some of this I was reminded of this excellent piece of writing by Randall J. LeVeque.  It is entitled "Top 10 reasons to not share your code (and why you should anyway)" but most of it applies equally well to data in my opinion.  Some excerpts follow.

Before discussing computer code, I'd like you to join me in a thought experiment. Suppose we lived in a universe where the standards for publication of mathematical theorems are quite di fferent: papers present theorems without proofs, and readers are expected to simply believe the author when it is stated that the theorem has been proved. 
In this alternative universe the reputation of the author would play a much larger role in deciding whether a paper containing a theorem could be published. ...  Eventually some agitators might come along and suggest that it would be better if mathematical papers contained proofs. Many arguments would be put forward for why this is a bad idea. Here are some of them ... 
1. The proof is too ugly to show anyone else. It would be too much work to rewrite it neatly so others could read it. And anyway it's just a one-o proof for this particular theorem, and not intended for others to see, or to use the ideas for proving other theorems. My time is much better spent proving another result and publishing more papers rather than putting more e ort into this theorem, which I've already proved 
2. I didn't work out all the details. Some tricky cases I didn't want to deal with, but the proof works fine for most cases, such as the ones I used in the examples in the paper. (Well, actually I discovered that some cases don't work, but they will probably never arise in practice.) 
3. I didn't actually prove the theorem, my student did.  And the student has since moved to Wall Street, and thrown away the proof, since course dissertations also need not include proofs.  But the student was very good, so I am sure it was correct. 
4. Giving the proof to my competitors would be unfair to me. It took years to prove this theorem, and the same idea can be used to prove other theorems. I should be able to publish at least 5 more papers before sharing the proof. If I share it now my competitors can use the ideas in it without having to do any work, and perhaps without even giving me credit since they won't have to reveal their proof technique in their papers. 
5. The proof is valuable intellectual property. The ideas in this proof are so great that I might be able to commercialize them some day, so I'd be crazy to give them away. 
6. Including proofs would make math papers much longer. Journals wouldn't want to publish them and who would want to read them? 
7. Referees will never agree to check proofs. It would be too hard to check correctness of long proofs and finding referees would become impossible.  It's already hard enough to find good referees and get them to submit reviews in finite time.  Requiring them to certify the correctness of proofs would bring the whole mathematical publishing business crashing down. 
8. The proof uses sophisticated mathematical machinery that most readers/referees don't know. Their wetware cannot fully execute the proof, so what's the point in making it available to them? 
9. My proof invokes other theorems with unpublished (proprietary) proofs. So it won't help to publish my proof - readers still will not be able to fully verify correctness. 
10. Readers who have access to my proof will want user support. Anyone who can't fi gure out all the details will send email requesting that I help them understand it, and asking how to modify the proof to prove their own theorem. I don't have time or sta ff to provide such support.