Proteins and Wave Functions: Using Molecular Overlap to Select Approximations in the Fragment Molecular Orbital Method

Thursday, January 17, 2013

Using Molecular Overlap to Select Approximations in the Fragment Molecular Orbital Method

A former student (who did the initial work) and I investigated here whether using overlaps between fragments to select the appropriate form of approximation in the Fragment Molecular Orbital method was any good. While not a complete manuscript in any form yet, the work-in-progress will be updated here to reflect new tendencies. In the end, it should be submitted to arXiv and eventually a peer-reviewed publication.

Here is the outline of what would be the paper - sprinkled with some text here and there along with a long list of TODO's.

1 Introduction

Until now, every fragment based method which aims to be linearly scaling needs approximations for largely separated n-mers. These approximations are invoked based on distances which are empirically determined to give results which are good enough but still ensure that the method scales linearly. This rather blunt instrument leaves much to be desired

2 Background and Theory

2.1 Relevant FMO equations

In the FMO method, the total energy of a system of $N$ fragments (monomers) is given as

$$E = \sum_I E_I + \sum_{I>J}^{close} \Delta E_{IJ} + \sum_{I>J}^{separated} \Delta \tilde{E}_{IJ} + \sum_{I>J>K}^{close} \Delta E_{IJK}.$$

Here, $E_I$ is the energy of monomer $I$ evaluated in the ESP of all other fragments. Pairs $IJ$ (dimers) are evaluated based on the shortest interfragment monomer distance $R_{I,J}$ (see (Steinmann2010effective) for definition). Dimers in close proximity ($R_{I,J} < R_\mathrm{resdim} $ where $R_\mathrm{resdim}$ is a user-defined distance) are evaluated self-consistently in the fixed ESP of the monomers

$$\Delta E_{IJ} = E_{IJ} - E_I - E_J.$$

Here, $E_{IJ}$ is the energy of dimer $IJ$ made from monomers $I$ and $J$. Separated dimers ($R_{I,J} > R_\mathrm{resdim} $) are evaluated using a Coulomb expression

$$\Delta \tilde{E}_{IJ} = \mathrm{Tr}(\mathcal{D}^Iu^J) + \mathrm{Tr}(\mathcal{D}^Ju^I) + \sum\sum \mathcal{D}^I\mathcal{D}^J(\mu\nu|\lambda\sigma) + \Delta E^\mathrm{NR}_{IJ},$$

where $\mathcal{D}^I$ is the density of monomer $I$, $u^J$ is the potential from the nuclei of fragment $J$ and $\Delta E^\mathrm{NR}_{IJ}$ is the nuclear repulsion interaction between nuclei in $I$ and $J$. The final term in Eq.1 is the trimer correction for trimer $IJK$. It is evaluated self-consistently only for close trimers (see [fedorov2004importance] for definition) in the fixed ESP of all monomers and given as

$$\Delta E_{IJK} = E_{IJK} - E_I - E_J - E_K - \Delta E_{IJ} - \Delta E_{IK} - \Delta E_{JK}.$$

Here, $E_{IJK}$ is the energy of trimer $IJK$. There is no corresponding approximate expression for trimer energies as in Eq. 3.

The default values for the distance based cutoffs for dimers and trimers are empirically determined to provide a reasonable mix of dimers and trimers with fortuitous error cancellation. We do note that the original formulation for the FMO3 method, no approximated dimers enter the equation due to the way the thresholds are set up. However as the number of trimers increases rapidly with system size it becomes unfeasible for large systems to carry out an FMO3 single point energy calculation. We propose an additional cutoff besides the distance cutoff based on the maximum overlap between dimers within a distance of $R_\mathrm{resdim}$ of each other. This helps to remove those $n$-mers that contribute little to the total energy with a noticeable speedup in computational effort.

2.2 Approximate Overlap Matrix

In this new approach, we want to calculate the maximum in the overlap matrix

$$\max_{i\in I,j\in J}\left\{S_{ij}\right\},$$

between two monomers $I$ and $J$ which make up dimer $IJ$ and use that overlap to determine whether we can ignore an SCF calculation on that dimer. Here $i$ and $j$ refer to MOs. Traditionally this requires the following transformation from atomic orbitals (AO)

$$S_{ij} = \langle \psi_i | \psi_j \rangle = \sum_i^\mathrm{occ} \sum_j^\mathrm{occ} C_{IJ,i\mu} C_{IJ,j\nu} \langle \chi_\mu | \chi_\nu \rangle = \sum_i^\mathrm{occ} \sum_j^\mathrm{occ} C_{IJ,i\mu} C_{IJ,j\nu} S_{\mu\nu},$$

where the matrices $\mathbf{C}_{IJ}$ are MO coefficient matrices for dimer $IJ$, $\chi_\mu$ is an atomic orbital and $S_{\mu\nu}$ is the overlap matrix in AO basis. However, this requires the full SCF calculation to already have been carried out on dimer $IJ$. Instead, we choose to approximate the dimer MOs by combining monomer MOs

$$C_{IJ} \approx C_I \oplus C_J.$$

Using the above, we can now calculate the maximum overlap (Eq.~\ref{eqn:maxoverlap}) through the transformation (Eq.5) \emph{before} we evaluate the SCF of the dimer. This enables us to do an approximate dimer (Eq.3) if the overlap is not large enough.

2.3 Considerations for Dimers and Trimers

2.4 Implementation and Methodology

The overlap based rejection of dimers and trimers was implemented in a locally modified version of GAMESS using the distributed data interface [25]. For all calculations SCF convergence was increased (\$CONV=1E-7 in \$SCF) to match that of FMO and integral accuracy increased (ICUT=12 in \$CONTRL).

For all FMO calculations we used the default values for cutoffs based on distances (RESDIM=2.0 in \$FMO for dimers, RITRIM(1)=1.24, -1.0, 2.0, 2.0 in \$FMO for trimers). To obtain approximate dimer MO coefficients we used MODORB=3 in $FMOPRP. All other setting were default value unless specified directly.

3 Results

TODO: Compile the results and write about them. Benchmarking water clusters from previous FMO applications should be OK.

TODO: There is a problem with the test-systems of the ALA-helices and sheets, perhaps it is because they are too simple and repetitive to actually mean anything