Motivation

Perhaps the most well-known and widely-used algorithm for calculating free energies on the fly from molecular simulation is metadynamics (1) and its flavors (2). It really was a game changer, as evidenced by the 2500+ citations the original paper has amassed since its publication in 2002. It also helped that it was elegant in its simplicity and a public implementation was made available through PLUMED which greatly improved its adoption; free energy calculations have now become an indispensable tool for the molecular modeler. Lurking in the shadows however was another algorithm which, though proposed earlier than metadynamics (3) and in my opinion superior in nearly every way, is not nearly as popular. This of course is the adaptive biasing force method, or ABF for short.

Now there are a number of contributing factors as to why I believe ABF never really caught on. But before I proceed, I want to make clear my expectations of the reader: I assume in this blog post that you are generally familiar with advanced free energy sampling and its concepts/jargon, basic statistical mechanics and are at least moderately comfortable with multivariable calculus and linear algebra. If you’re not, feel free to read on and pick up what you can, or wait until I write a future post introducing free energy sampling methods (which I plan on doing). Anyways, where were we? Some reasons I think prevented kept ABF from reaching rock-star status: First, ABF is not as simple to implement as metadynamics. Second, I also feel that the authors shot themselves in the foot by not writing the paper in a more simple “implementation-oriented” fashion. Third, ABF requires second derivatives of the collective variables with respect to the atomic coordinates, which gets nasty very quickly (more on this later). Finally, it just wasn’t available in a nice portable package like PLUMED.

This post will focus on describing a later development on the original ABF method (4). This paper solves problems 2 and 3, and various implementations of ABF are now available in many packages. As lead developer of SSAGES, an advanced sampling suite, and co-author of its ABF module, I wanted to step through various aspects of Darve’s 2008 paper (4) and discuss certain implementation choices. I’ll wrap up by using it to resolve the free energy surface of a simple system. Hopefully I will make a strong case for why ABF is currently my go-to method for estimating free energies, with some minor exceptions. Let’s begin!

Preliminary

Advanced sampling methods such as adaptive umbrella sampling, metadynamics and Wang-Landau sampling can be categorized as adaptive biasing potentials. That is, there is an underlying biasing potential V(\mathbf{\xi}) which acts on a set of collective variables, \mathbf{\xi}. This bias is propagated to the atomic coordinates via the chain rule,

    \[ \frac{\partial V}{\partial x_i} = \frac{\partial V}{\partial \xi_j} \frac{\partial \xi_j}{\partial x_i}. \]

All of these methods adjust the shape of the bias V(\mathbf{\xi}) over the course of the simulation in order to achieve uniform sampling along \mathbf{\xi}. You can watch the video below which illustrates this principle.

ABF on the other hand is an adaptive biasing force (duh) method. What this means is that instead of adapting a potential to the underlying free energy surface, it seeks to estimate the mean force \langle F_\xi | \xi^* \rangle at specific points \xi^* along the order parameter. Practically speaking, the order parameter is divided into discrete bins and a running estimate of the average force in each bin is refined over time, which converges to negative the free energy (A(\xi)) gradient,

    \[ \langle F_\xi | \xi^* \rangle = -\frac{dA(\xi^*)}{d\xi}. \]

The consequence of converging to the mean force is that it eliminates the energetic barrier along the collective variable; the system should be able to freely diffuse along \xi.

So why bother at all with ABF? Isn’t metadynamics good enough? Well, in adaptive biasing potential methods, the quantity being estimated is the free energy, A(\xi)=-k_BT\log{P(\xi)}, which is essentially the underlying probability distribution. This is an inherently global measure. A probability at a given point \xi^* is only meaningful if it is known relative to another point along the collective variable. In other words, free energy is arbitrary up to an additive constant, hence it is only makes sense to discuss it in relative terms. Consequently, repeated sampling of the entire interval (or up to an relative free energy) is required to converge the free energy surface. Additionally, if for some reason there are sampling issues at any point along the collective variable it may affect the rest of the surface; this type of problem is especially prominent at the edges of bounded intervals.

Because ABF estimates a force, or gradient, it is an inherently local property. The derivative at a single point can be estimated locally without visiting other parts of the collective variable. This leads to faster convergence, and is less prone to boundary problems. Furthermore, since the estimation occurs on a discrete basis over a series of bins, one does not have to worry about the many parameters that go into metadynamic-type simulations such as Gaussian widths, height, deposition rate, etc…

Derivation

So far we’ve just been discussing things at a high-level. At the end of the day, what exactly is ABF? What mathematical expression(s) do we plan on representing? We begin with the concept of thermodynamic integration. Given two states, \xi_0 and \xi_1, the difference in free energy between them can be expresses as,

    \[ A(\xi_1) -A(\xi_0) = \int_{\xi_0}^{\xi_1}{\frac{\mathrm{d}A}{\mathrm{d}\xi} \mathrm{d}\xi \]

where

    \[ \frac{\mathrm{d}A}{\mathrm{d}\xi} = \left\langle \frac{\partial \mathcal{H}}{\partial \xi} \right\rangle_\xi. \]

This is a classical result. Here \mathcal{H} is the Hamiltonian of the system, U is the potential energy, and the derivatives are evaluated in a constrained ensemble (\xi = \xi^*). The limitation in using this expression is that the presence of multiple reactive pathways along the collective variable may give rise to quasi-nonergodicity. Another statistical mechanical result is the expression for the mean force in an unconstrained ensemble,

    \[ \frac{\mathrm{d}A}{\mathrm{d}\xi} = \left\langle \frac{\partial U}{\partial \xi}  - k_BT \frac{\partial \ln{|J|}}{\partial \xi}\right\rangle. \]

Here we can think of the first term on the RHS as the mechanical force acting along \xi, and the second term as the entropic contribution. The appearance of the Jacobian J in this expression is because of the transformation from a set of generalized coordinates to the desired collective variable. This next bit is really where the magic happens; as a reminder you can look back to the references below, in particular (3, 4) for more details. First let’s take \mathbf{w} to be an arbitrary vector field – literally any arbitrary vector. Subject to the condition that \mathbf{w} \cdot \nabla \xi \neq 0, it can be shown that

    \[ \frac{\mathrm{d}A}{\mathrm{d}\xi} = \left\langle \nabla U \cdot \frac{\mathbf{w}}{\mathbf{w}\cdot \nabla \xi} - k_BT \nabla \cdot \frac{\mathbf{w}}{\mathbf{w}\cdot \nabla \xi} \right\rangle. \]

Let’s take this another step. If we make a choice of \mathbf{w} such that \mathbf{w}\cdot \nabla \xi = 1, then the above expression simplifies to

    \[ \frac{\mathrm{d}A}{\mathrm{d}\xi} = \left\langle \nabla U \cdot \mathbf{w} - k_BT \nabla \cdot \mathbf{w} \right\rangle. \]

At this point I’ve just thrown a bunch of equations at you. I’ll try to break things down a bit. First, \nabla \xi is just the gradient of the collective variable with respect to our atomic coordinates, \partial \xi / \partial x_i. This is a standard requirement for the definition of any collective variable if we wish to sample along it in molecular dynamics. \nabla U is just a vector of the atomic forces. It may seem that we have a fine expression that we can use in an algorithm, but there’s somewhat of an issue. To satisfy the condition  \mathbf{w}\cdot \nabla \xi = 1 we can take w_i = \partial x_i / \partial \xi. However, in the equation for (negative) the mean force, we have a divergence term which necessitates second order spatial derivatives – yuck!

The main development of Darve (4) was to derive an equivalent expression that is first order in space, but introduces an additional first order in time derivative,

    \[ \frac{\mathrm{d}A}{\mathrm{d}\xi} = -\left\langle \frac{d}{dt}(\mathbf{w} \cdot \mathbf{p}) \middle| \xi \right\rangle. \]

Here \mathbf{p} is a vector of the atomic momenta. Cool huh? That’s really it. No matter what our choice of instantaneous vector field \mathbf{w} is, subject to the condition given, we will eventually converge to the mean force. One final point is that the ABF method is one of the few methods for which there is proven (exponential) convergence. Though in practice many adaptive biasing methods were in wide use well before proofs of their convergence were published (and they worked just fine), it’s still an added bonus. Armed with this final equation, let’s proceed to implementing the ABF algorithm in practice and hopefully that will help clear up any lingering questions.

Implementation

The first step in implementing ABF is to define a grid over the set of collective variable(s) which will be sampled. The number of bins should be chosen to accommodate the width of the finest features you are interested in resolving. There are two quantities which will be stored on the grid: the total accumulated force, and the number of hits at each grid point. The ratio of these two quantities will converge to the mean force of interest. Within a molecular dynamics simulation, we can define an iterative procedure for the algorithm.

  1. The interatomic forces - \nabla U are computed by the MD engine at timestep t_i.
  2. The collective variable(s) are computed and the gradient \nabla \xi are calculated from the atomic coordinates at timestep t_i.
  3. Compute the arbitrary vector field \mathbf{w} from \nabla \xi. This is where we have to take things a bit slow. Recall the condition \mathbf{w}\cdot \nabla \xi = 1 which needs to be satisfied? Well, there are a number of ways of finding a vector (or matrix) to fulfill that condition. Let’s go over some of those.

First let’s establish some notation. \mathbf{J} will represent the Jacobian matrix of the collective variable(s), containing the gradient of each respective CV along the rows. Thus the entries of \mathbf{J} are J_{ij} = \partial \xi_i / \partial x_j where x_j are the 3N degrees of freedom, typically the x,y,z coordinates of the atoms. Our arbitrary vector field matrix \mathbf{W} will then have N_\xi columns and 3N rows such that \mathbf{J}\mathbf{W} = \mathbf{I}. The first thing that came to my mind when I saw that condition was the Moore-Pensorse pseudoinverse.

Linear algebra interlude

Suppose we have a matrix \mathbf{J} which is square (not necessarily the case above) and we are interested in finding a matrix \mathbf{W} such that \mathbf{J}\mathbf{W} = \mathbf{I}. The clear answer would be to compute the inverse of \mathbf{J}, \mathbf{W} = \mathbf{J}^{-1}. The Moore-Penrose pseudoinverse is a generalized notion of an inverse for non-square matrices. In our case we are looking for a right inverse. We first define a symmetric basis as \mathbf{A} = \mathbf{J}\mathbf{J}^{T} (all entries are real), and take \mathbf{W} as \mathbf{J}^{T}\mathbf{A}^{-1}. In other words, \mathbf{W} = \mathbf{J}^T\left(\mathbf{J}\mathbf{J}^T\right)^{-1}.

Darve (4) suggests the choice W^T = \mathbf{M}_\xi \mathbf{J} \mathbf{M}^{-1} where \mathbf{M}_{\xi}^{-1}=\mathbf{J} \mathbf{M}^{-1}\mathbf{J}^T. Here \mathbf{M} is the 3N diagonal mass matrix, containing the atomic masses. This can be thought of as a “mass weighted” pseudoinverse. Making this choice actually results in another ABF equation which can also be used, but I prefer to compute \mathbf{W} in the manner described above. Notice as well how, if the mass matrix was the identity matrix, it is actually equivalent to the pseudoinverse. The reason for choosing the pseudoinverse (SSAGES implements both) is that in the case where a simulation contains virtual sites, as is often the case in many water models, the mass matrix will contain a zero along the diagonal rendering it singular. In either case, the user is free to choose whichever they please.

Will that choice make a difference? I previously mentioned that no matter the choice of \mathbf{W}, we will always converge to the mean force. This is true, but our choice of \mathbf{W} affects the rate at which we converge. Choosing it wisely means we converge very rapidly, while a poor results in slow convergence. In the numerical experiments I’ve run, I did not notice a significant difference in performance between the mass-weighted and pure pseudoinverse. One last note before returning the program is that \mathbf{W} can also be computed using the expression \mathbf{W} = \mathbf{J}^{T}/||\mathbf{J}||^2 if and only if the collective variables are orthogonal. Otherwise, \mathbf{J} \mathbf{W} \neq \mathbf{I}. However, it’s usually not too much trouble to compute it in the manner I’ve previously described, so we just stick with that. Now, back to the program.

  1. Once \mathbf{W} is computed, we calculate \mathbf{W}^{T} \mathbf{p} where \mathbf{p} is a 3N vector of the atomic momenta.
  2. The time derivative is calculated using finite difference. This is perfectly fine since the MD engine is taking finite steps in time anyways. So long as the finite difference scheme is of the same or greater order of accuracy, then things should be fine. In SSAGES we choose a second order accurate finite difference scheme so d/dt(\mathbf{W} \cdot \mathbf{p}) \approx \left(1.5[\mathbf{W}\cdot\mathbf{p}]_t - 2[\mathbf{W}\cdot\mathbf{p}]_{t-1} + 0.5 [\mathbf{W}\cdot\mathbf{p}]_{t-2}\right)/\delta t + \mathcal{O}(\delta t^2). This means we need to store the two previous dot products and start biasing only after the third iteration, which is fine.
  3. Subtract the previous mean force estimate \mathbf{F} at the current point \xi^* from the time derivative d/dt(\mathbf{W} \cdot \mathbf{p}) we just computed. This is important since as the simulation proceeds, we need to account for the bias due to the external force we have been applying to the system.
  4. Sum d/dt(\mathbf{W} \cdot \mathbf{p}) into our grid at \mathbf{F}(\xi^*), and increment the number of hits at that grid point.
  5. Update the finite difference time derivatives.
  6. Compute the external bias that we are going to sum into the atomic forces as -\mathbf{F}(\xi^*) \nabla \xi  / N(\xi^*). Here arises another practical implementation issue. Early on, this force estimate will fluctuate wildly and can easily cause the simulation to blow up. So what is suggested (and what we do), is to have a minimum number of hits in each bin such that the force is turned on in a linear ramp. To do this simply divide by \max(N_{\mathrm{min}},N(\xi^*)) instead of just N(\xi^*).
  7. Update the atomic forces and let the MD engine integrate the forces.
  8. Return to step 1.

And there you have it, a breakdown of ABF’s mechanics. I do recommend you take a look at the source file for our implementation if you’re a bit more interested, because we implement a scalable strategy for multiple walkers that I did not discuss in this post. That wraps things up with respect to the ABF algorithm itself. What still remains however, is the post-processing.

Analysis

Once a simulation is complete (or you simply run out of CPU time on your cluster) we need to extract the free energy corresponding to the mean force we just estimated. For a one-dimensional surface, this is as simple as integrating the force along the collective variable by calling “cumtrapz(x,F)” in MATLAB or Numpy. However, for multi-dimensional surfaces, things get a bit more complicated. It turns out that due to statistical fluctuations in the force vector field, the path chosen to integrate along affects the resulting free energy surface. In other words, the free energy is no longer a conserving potential! We all know that as a state function, the free energy difference between two points should be independent of the path chosen. Dealing with this issue is the subject of another post entirely – or you can just go on and read the discussion in Darve (4) if you’d like.

Conclusion

I’d like to leave you with a video I made using ABF to estimate the potential of mean force between an NaCl ion pair in water. It just gives you a feel for how the method converges (quickly!) and is nice to watch. I hope I’ve been able to demystify ABF a bit for you. If I haven’t then I apologize. In either case, Enjoy!

1.
Laio A, Parrinello M (2002) Escaping free-energy minima. Proceedings of the National Academy of Sciences 99(20):12562–12566. [Source]
2.
Barducci A, Bussi G, Parrinello M (2008) Well-Tempered Metadynamics: A Smoothly Converging and Tunable Free-Energy Method. Phys Rev Lett 100(2). doi: 10.1103/physrevlett.100.020603
3.
Darve E, Pohorille A (2001) Calculating free energies using average force. The Journal of Chemical Physics 115(20):9169–9183. [Source]
4.
Darve E, Rodríguez-Gómez D, Pohorille A (2008) Adaptive biasing force method for scalar and vector free energy calculations. The Journal of Chemical Physics 128(14):144120. [Source]