## Thursday, July 21, 2016

### Finding disordered residues in an NMR ensemble

Note to self: here's how you identified disordered residues in the NMR ensemble 2KCU.pdb

1. In Pymol: "fetch 2kcu"

2. Action > align > states (*/CA)
2016.08.07 update: the above command also aligns the tails.  Use "intra_fit (2kzn///6-158/CA)"

3. "save 2kcu_aligned.pdb, state=0"

4. In terminal: grep CA 2kcu_aligned.pdb > lis

5. python disorder.py

disorder.py (given below) calculates the standard deviation of the x, y, and z coordinate of each CA atom ($\sigma_{x,i}, \sigma_{y,i}, \sigma_{z,i})$. It then averages these three standard deviations for each CA atom $(\sigma_i)$.  To find outliers, it averages these values for the entire protein $(\langle \sigma_i \rangle)$ and computes the standard deviation of this average $(\sigma_{\langle \sigma_i \rangle})$. Any residues for which $\sigma_i > \langle \sigma_i \rangle + \sigma_{\langle \sigma_i \rangle}$ is identified as disordered.

Here I've colored the disordered residues red (haven't updated the picture based on Step 2-change yet)

Yes, I know: "the 1970's called and want their Fortran code back". How very droll.

## Tuesday, July 12, 2016

### Reproducing stats or verbose output from LINEST command in Excel or Google Sheet in Python

The python code above reproduces the output produced by the LINEST(y;x;true;true) command in Excel [LINEST(y,x,true,true) in Google Sheets] with a csv file as input.  In the csv file I have assumed that the x and y columns are labelled "x" and "y" respectively.  This page has a good explanation of the output (pdf).

## Monday, July 11, 2016

### 2nd Reviews for Prediction of pKa values using the PM6 semiempirical method

2016.07.15: Update: our rebuttal can be fund here.  Manuscript resubmitted.

The 2nd round of reviews on our latest PeerJ submission came in on July 7th.  The first round of round of reviews and a link to our response can be found here.

MINOR REVISIONS
Thank you for your efforts at addressing the reviewer's comments. In spite of that, the reviewers (and myself) still think that additional data should be moved from the Supporting Information to the main text as tables/graphs. Specifically:

-per reviewer 1's request, please include the ref. pKa data in table 1. The Supporting Material deposited in figshare is very complete, but it will enormously help the reader (and make your paper much more persusive at first reading) if the most salient pieces were included in the paper itself.

- contra reviewer 1's comment, I do acknowledge that p.799 of the quoted Stewart (2008) reference describes the pKa computation procedure which generated the data present in http://openmopac.net/pKa_table.html. This method (also used by Rayne) as well as the method by Juranic, however, do not computes pKa from the energy difference itself, but from an empirical fit of the O-H bond distances and approximate charges (or N and H charges, plus a dummy variable stating whether the amine is primary, secondary or terciary, for Juranic, 2014). These pKa computation approaches are therefore fundamentally different from the one used in your paper. Your references to this literature in the introduction, however, do not make this clear enough. Please improve this to clearly compare the competing methods for PM6-based pka computations to the your approach.

-Do include the statistical data regarding slope, R-squared and outliers. A motivated reader may easily graph the data you have computed (and which are present in the spreadsheet referred to in your figshare area), but your explanation and discussion would be much more readable, and certainly more persuasive, if you included those graphs, slopes and correlation coefficientes in the paper. That analysis shows more clearly than the aggregat tables exactly where PM6 affords better correlation/slope that even CBS-4B3 (pyridines), the identity of the outliers, how poorly all methods (even CBS-4B3) correlate to experimental pKa in amines (in spite of a seemingly low 0.2 MAD for CBS-4B3), etc.

-per reviewer 2's request (and also related to my previous request which I may not have worded clearly) please add data regarding the likely origin of the errors in the outliers: do they come from gas phase energies or the solvation? A simple comparison of the B3LYP gas-phase energy changes (on PM6-optimized geometries, to reduce computational effort) and solvation effects might be enough to tell whether the gas-phase acidities (and/or solvation) of PM6 generally track the DFT results.

Reviewer 1 (Anonymous)

The authors have not adequately responded to any of the concerns raised in my original review. My original comments are shown first. The authors’ responses are shown next; and my further responses to them are shown below.

(1) Basic reporting
This is an interesting manuscript, but a frustrating aspect is that the experimental pKa values used for comparison are not included for most compounds. These could easily be added to Table 1. In fact, the best solution would be to modify Table 2 to give the calculated pKa values from the various methods along with the experimental values.

Authors: The values are already provided in Supplementary Materials

The copy I received contained no reference at all to “Supplementary Materials”. If these materials are available directions for accessing them should be clearly presented in the normal position just before the References.

(2) Also, the authors should refer somewhere to the very relevant PM6 pKa calculations by Jimmy Stewart given in http://openmopac.net/pKa_table.html.

Authors: We already refer to this approach in the introduction (Stewart 2008).

The reference Stewart (2008) concerns proteins and has nothing at all to do with pKa estimates. As clearly indicated, the relevant Stewart study is not a formal publication, but has been made widely available to workers by Stewart on the web page as indicated. Apparently the authors didn’t even bother to look at it.

(3) Validity of the findings
It would be very helpful if the authors would provide a figure comparing the calculated and experimental values, and include in the text the relevant equation with proper statistics (n, r2, s, F) along with the uncertainties for the slope & intercept. (See, e.g., the book by Shields & Seybold on this topic, or their WIRES article.)

Authors: The statistical analysis the reviewer refers to is done in the context of a QSAR prediction of pKa from QM data, i.e. to gauge the accuracy a linear fit to be used in the prediction of unknown pKa values. The statistics used in this paper is just aimed at gauging the accuracy of the predicted values and, in our opinion, is more than adequate for the task. If the reviewer can explain how the requested statistics is to be used in the context of the current paper we will be happy to reconsider the request.

This is a standard way to compare not just QSAR results, but any studies in this field. It would be helpful, and I don’t understand the authors’ reluctance to include it.

Annotated manuscript
The reviewer has also provided an annotated manuscript as part of their review:

Reviewer 2 (Anonymous)