Proteins and Wave Functions

Wednesday, January 11, 2023

Affordable open access chemistry publishing options in 2023

Here is an updated list of affordable (APC < $2000) impact neutral and other select OA publishing options for chemistry

Impact neutral journals
\$1195 PeerJ Chemistry journals. Open peer review. (Disclaimer I am Editor-in-chief for PeerJ Physical Chemistry). PeerJ also has a membership model, which may be cheaper than the APC.

\$1210 Results in Chemistry. Closed peer review

\$1350 F1000Research. Open peer review. Bio-related

\$1595 PeerJ - Life and Environment. Open peer review. Bio-related. PeerJ also has a membership model, which may be cheaper than the APC.

\$1680. Royal Society Open Science. Open peer review.

(The RSC manages "the journal’s chemistry section by commissioning articles and overseeing the peer-review process")

\$1685 ACS Omega. Closed peer review.

$1805 PLoS ONE. Closed peer review.

Free or reasonably priced journals that judge perceived impact

$0 Chemical Science Closed peer review

$0 CCS Chemistry Closed peer review

$0 Beilstein Journal of Organic Chemistry. Closed peer review.

$0 Beilstein Journal of Nanotechnology. Closed peer review.

$0 SciPost Chemistry Open peer review. (Disclaimer: I am an editor for SciPost)

$0 SciPost Chemistry Core Open peer review. (Disclaimer: I am an editor for SciPost)

$0 Digital Discovery Open peer review. (Disclaimer: I am on the advisory board)

\$0 ACS Central Science. Open peer review.

$100 Living Journal of Computational Molecular Science. Closed peer review

€100 Chemistry². Closed peer review.

£1000 RSC Advances. Closed peer review.

Let me know if I have missed anything.

This work is licensed under a Creative Commons Attribution 4.0

Tuesday, January 4, 2022

Open access chemistry publishing options in 2022

Here is an updated list of affordable impact neutral and other select OA publishing options for chemistry

Impact neutral journals
\$1000 Results in Chemistry. Closed peer review

\$1195 PeerJ Chemistry journals. Open peer review. (Disclaimer I am an editor for PeerJ Physical Chemistry). PeerJ also has a membership model, which may be cheaper than the APC.

\$1195 PeerJ - Life and Environment. Open peer review. Bio-related. PeerJ also has a membership model, which may be cheaper than the APC.

\$1350 F1000Research. Open peer review. Bio-related

$1350 PLoS ONE. Closed peer review.

$1680. Royal Society Open Science. Open peer review.

(The RSC manages "the journal’s chemistry section by commissioning articles and overseeing the peer-review process")

\$1685 ACS Omega. Closed peer review.

$1990 Scientific Reports. Closed peer review

Free or reasonably priced journals that judge perceived impact

$0 SciPost Chemistry Open peer review. (Disclaimer: I am an editor for SciPost)

$0 SciPost Chemistry Core Open peer review. (Disclaimer: I am an editor for SciPost)

$0 Digital Discovery Open peer review.

\$0 ACS Central Science. Open peer review. ($1000 for CC-BY)

$100 Living Journal of Computational Molecular Science. Closed peer review

€500 Chemistry². Closed peer review.

£750 RSC Advances. Closed peer review.

Let me know if I have missed anything.

This work is licensed under a Creative Commons Attribution 4.0

Monday, March 22, 2021

Can machine learning regression extrapolate?

I recently developed a ML model to predict pIC50 values for molecules and used it together with a genetic algorithm code to search for molecules with large pIC50 values. However, the GA searches never found molecules with pIC50 values that where larger than in my training set.

This brought up the general question of whether ML models are capable of outputting values that are larger than those found in the training set. I made a simple example to investigate this issue for different ML models.

$\mathbf{X}_1 = (1, 0, 0)$ corresponds to 1, $\mathbf{X}_2 = (0, 1, 0)$ corresponds to 2, and $\mathbf{X}_3 = (0, 0, 1)$ corresponds to 3.

The code can be found here. If you are new to ML check out this site

Linear Regression
This training set can be fit by a linear regression model $y = \mathbf{wX}$ with the weights $\mathbf{w} = (1, 2, 3)$. Clearly this simple ML can extrapolate in the sense that, for example, $\mathbf{X} = (1, 0, 1)$ will yield 4, which is larger than max value in the training set (3). Similarly, $\mathbf{X} = (0, 0, 2)$ will yield 6.

Neural Network
Next I tried a NN with one hidden layer with 2 nodes and the sigmoid activation function. For this model $\mathbf{X} = (1, 0, 1)$ yields 1.6 and $\mathbf{X} = (0, 0, 2)$ yields 3.2, which is only slightly larger than 3.

The output of the NN is given by $\mathbf{O}_h\mathbf{w}_{ho}+\mathbf{b}_{ho}$, where $\mathbf{O}_h$ is the output of the hidden layer. Using the sigmoid function, the maximum value of $\mathbf{O}_h = (1, 1)$, for which $\mathbf{O}_h\mathbf{w}_{ho}+\mathbf{b}_{ho}$ = 3.3. So this is the maximum value this NN can output. For comparison, $\mathbf{O}_h = (0.99, 0.65)$ for $\mathbf{X}_3 = (0, 0, 1)$.

If I instead use the ReLU activation function (which doesn't have an upper bound), $\mathbf{X} = (1, 0, 1)$ yields 2.2 and $\mathbf{X} = (0, 0, 2)$ yields 4.2, which is somewhat larger than 3.

So, NNs that exclusively use ReLU can in principle yield values that are larger than those found in the training set. But if one layer uses bounded activation functions such as sigmoid, then it depends on how close the outputs of that layer are to 1 when predicting the largest value in the training set.

Random Forest
The example is so simple that it can be fit with a single decision tree (RF outputs the mean prediction of a collection of such decision trees):

Clearly the RF can only output values in the range of the training set.

This work is licensed under a Creative Commons Attribution 4.0

Friday, January 8, 2021

Open access chemistry publishing options in 2021

Here is an updated list of affordable impact neutral and other select OA publishing options for chemistry

Impact neutral journals
\$1000 Results in Chemistry. Closed peer review

\$1250 ACS Omega. Closed peer review.

\$1350 F1000Research. Open peer review. Bio-related

$1680. Royal Society Open Science. Open peer review.

(The RSC manages "the journal’s chemistry section by commissioning articles and overseeing the peer-review process")

$1695 PLoS ONE. Closed peer review.

$1990 Scientific Reports. Closed peer review

Free or reasonably priced journals that judge perceived impact

$0 SciPost Chemistry Open peer review. (Disclaimer: I am an editor for SciPost)

$0 SciPost Chemistry Core Open peer review. (Disclaimer: I am an editor for SciPost)

$0 Digital Discovery Open peer review.

\$0 ACS Central Science. Open peer review. ($1000 for CC-BY)

$100 Living Journal of Computational Molecular Science. Closed peer review

€500 Chemistry². Closed peer review.

£750 RSC Advances. Closed peer review.

Let me know if I have missed anything.

This work is licensed under a Creative Commons Attribution 4.0

Tuesday, December 22, 2020

Generating publication quality figures of molecules using RDKit

With RDKit version 2020.09 the drawing code got a major overhaul and, IMO, it can now be used to make figures for publications. Based on Greg Landrum's show_mol function I made a simpler version for multiple molecules that'll also give you png and, especially, pdf files.

Blogger doesn't allow you to upload pdf files, so the picture above really doesn't do the corresponding pdf file justice.

This work is licensed under a Creative Commons Attribution 4.0

Friday, November 6, 2020

Mapping atoms to reactants in products made with reaction SMARTS

Reaction SMARTS is an RDKit feature that is very useful for generating reactant products pairs for a given reaction. Unfortunately the algorithm changes the atom order between reactants and products, which creates problems one tries to locate the reaction paths using an interpolation-based algorithm such as nudged elastic band (NEB).

Fortunately, RDKit keeps track of the the change in atom order (thanks to my MS student Julius Seumer for the tip!) and it's easy to reorder the atoms:

def reorder_product(product):
  reorder_inverse = [int(atom.GetProp('react_atom_idx')) for atom in product.GetAtoms()]
  reorder = len(reorder_inverse)*[0]

  for i in range(len(reorder_inverse)):
    reorder[reorder_inverse[i]] = i

  product = Chem.RenumberAtoms(product, reorder)

  return product

If the 3D structures are to be used for interpolation it is important to embed the reactant structure before converting it to product. This keeps the "label-chirality" of groups such as CH2 the same in reactant and products.

Here is a demo notebook

This work is licensed under a Creative Commons Attribution 4.0

Saturday, October 31, 2020

atom_mapper: matching all atoms in reactants and products

It's almost 3 years ago that I wrote about atom mapper and now the code has received a major overhaul thanks to my PhD student Mads Koerstz. The 2D atom mapping part is basically left untouched, but the main new thing is a general implementation of the 3D mapping problem.

The 3D mapping problem is that if labels (i.e. atom orders) are considered then all tetrahedral centers are chiral and the chirality of centers with equivalent atoms, such as CH2 groups, generated by RDKit's embed function will be arbitrary and unlikely to match in reactants and products. This creates problems for methods such as nudged elastic band (NEB) that try to determine the reaction path by interpolation.

Mads found a clever approach using chiral atom tags, where he generates arbitrary tags for the reactant and makes sure the tags match in the product. If there are true chiral centers he also generates all enantiomers.

The old version had some code that tried to align the coordinates, but that has been removed since that can be done much better with xTBs reaction path method.

Here is a demo notebook

This work is licensed under a Creative Commons Attribution 4.0