The post Adaptive Biasing Force appeared first on Hythem Sidky.

]]>Perhaps the most well-known and widely-used algorithm for calculating free energies on the fly from molecular simulation is metadynamics (1) and its flavors (2). It really was a game changer, as evidenced by the 2500+ citations the original paper has amassed since its publication in 2002. It also helped that it was elegant in its simplicity and a public implementation was made available through PLUMED which greatly improved its adoption; free energy calculations have now become an indispensable tool for the molecular modeler. Lurking in the shadows however was another algorithm which, though proposed *earlier* than metadynamics (3) and in my opinion superior in nearly every way, is not nearly as popular. This of course is the adaptive biasing force method, or ABF for short.

Now there are a number of contributing factors as to why I believe ABF never really caught on. But before I proceed, I want to make clear my expectations of the reader: I assume in this blog post that you are generally familiar with advanced free energy sampling and its concepts/jargon, basic statistical mechanics and are at least moderately comfortable with multivariable calculus and linear algebra. If you’re not, feel free to read on and pick up what you can, or wait until I write a future post introducing free energy sampling methods (which I plan on doing). Anyways, where were we? Some reasons I think prevented kept ABF from reaching rock-star status: First, ABF is not as simple to implement as metadynamics. Second, I also feel that the authors shot themselves in the foot by not writing the paper in a more simple “implementation-oriented” fashion. Third, ABF requires second derivatives of the collective variables with respect to the atomic coordinates, which gets nasty very quickly (more on this later). Finally, it just wasn’t available in a nice portable package like PLUMED.

This post will focus on describing a later development on the original ABF method (4). This paper solves problems 2 and 3, and various implementations of ABF are now available in many packages. As lead developer of SSAGES, an advanced sampling suite, and co-author of its ABF module, I wanted to step through various aspects of Darve’s 2008 paper (4) and discuss certain implementation choices. I’ll wrap up by using it to resolve the free energy surface of a simple system. Hopefully I will make a strong case for why ABF is currently my go-to method for estimating free energies, with some minor exceptions. Let’s begin!

Advanced sampling methods such as adaptive umbrella sampling, metadynamics and Wang-Landau sampling can be categorized as *adaptive biasing potentials*. That is, there is an underlying biasing potential which acts on a set of collective variables, . This bias is propagated to the atomic coordinates via the chain rule,

All of these methods adjust the shape of the bias over the course of the simulation in order to achieve uniform sampling along . You can watch the video below which illustrates this principle.

ABF on the other hand is an *adaptive biasing force* (duh) method. What this means is that instead of adapting a potential to the underlying free energy surface, it seeks to estimate the *mean force * at specific points along the order parameter. Practically speaking, the order parameter is divided into discrete bins and a running estimate of the average force in each bin is refined over time, which converges to negative the free energy () gradient,

The consequence of converging to the mean force is that it eliminates the energetic barrier along the collective variable; the system should be able to freely diffuse along .

So why bother at all with ABF? Isn’t metadynamics good enough? Well, in adaptive biasing potential methods, the quantity being estimated is the free energy, , which is essentially the underlying probability distribution. This is an inherently *global* measure. A probability at a given point is only meaningful if it is known relative to another point along the collective variable. In other words, free energy is arbitrary up to an additive constant, hence it is only makes sense to discuss it in relative terms. Consequently, repeated sampling of the entire interval (or up to an relative free energy) is required to converge the free energy surface. Additionally, if for some reason there are sampling issues at any point along the collective variable it may affect the rest of the surface; this type of problem is especially prominent at the edges of bounded intervals.

Because ABF estimates a force, or gradient, it is an inherently *local* property. The derivative at a single point can be estimated locally without visiting other parts of the collective variable. This leads to faster convergence, and is less prone to boundary problems. Furthermore, since the estimation occurs on a discrete basis over a series of bins, one does not have to worry about the many parameters that go into metadynamic-type simulations such as Gaussian widths, height, deposition rate, etc…

So far we’ve just been discussing things at a high-level. At the end of the day, what exactly is ABF? What mathematical expression(s) do we plan on representing? We begin with the concept of thermodynamic integration. Given two states, and , the difference in free energy between them can be expresses as,

where

This is a classical result. Here is the Hamiltonian of the system, is the potential energy, and the derivatives are evaluated in a *constrained* ensemble . The limitation in using this expression is that the presence of multiple reactive pathways along the collective variable may give rise to quasi-nonergodicity. Another statistical mechanical result is the expression for the mean force in an unconstrained ensemble,

Here we can think of the first term on the RHS as the mechanical force acting along , and the second term as the entropic contribution. The appearance of the Jacobian in this expression is because of the transformation from a set of generalized coordinates to the desired collective variable. This next bit is really where the magic happens; as a reminder you can look back to the references below, in particular (3, 4) for more details. First let’s take to be an arbitrary vector field – literally any arbitrary vector. Subject to the condition that , it can be shown that

Let’s take this another step. If we make a choice of such that , then the above expression simplifies to

At this point I’ve just thrown a bunch of equations at you. I’ll try to break things down a bit. First, is just the gradient of the collective variable with respect to our atomic coordinates, . This is a standard requirement for the definition of any collective variable if we wish to sample along it in molecular dynamics. is just a vector of the atomic forces. It may seem that we have a fine expression that we can use in an algorithm, but there’s somewhat of an issue. To satisfy the condition we can take . However, in the equation for (negative) the mean force, we have a divergence term which necessitates second order spatial derivatives – yuck!

The main development of Darve (4) was to derive an equivalent expression that is first order in space, but introduces an additional first order in time derivative,

Here is a vector of the atomic momenta. Cool huh? That’s really it. No matter what our choice of instantaneous vector field is, subject to the condition given, we will eventually converge to the mean force. One final point is that the ABF method is one of the few methods for which there is proven (exponential) convergence. Though in practice many adaptive biasing methods were in wide use well before proofs of their convergence were published (and they worked just fine), it’s still an added bonus. Armed with this final equation, let’s proceed to implementing the ABF algorithm in practice and hopefully that will help clear up any lingering questions.

The first step in implementing ABF is to define a grid over the set of collective variable(s) which will be sampled. The number of bins should be chosen to accommodate the width of the finest features you are interested in resolving. There are two quantities which will be stored on the grid: the total accumulated force, and the number of hits at each grid point. The ratio of these two quantities will converge to the mean force of interest. Within a molecular dynamics simulation, we can define an iterative procedure for the algorithm.

- The interatomic forces are computed by the MD engine at timestep .
- The collective variable(s) are computed and the gradient are calculated from the atomic coordinates at timestep .
- Compute the arbitrary vector field from . This is where we have to take things a bit slow. Recall the condition which needs to be satisfied? Well, there are a number of ways of finding a vector (or matrix) to fulfill that condition. Let’s go over some of those.

First let’s establish some notation. will represent the Jacobian matrix of the collective variable(s), containing the gradient of each respective CV along the rows. Thus the entries of are where are the degrees of freedom, typically the x,y,z coordinates of the atoms. Our arbitrary vector field matrix will then have columns and rows such that . The first thing that came to my mind when I saw that condition was the Moore-Pensorse pseudoinverse.

Linear algebra interludeSuppose we have a matrix which is square (not necessarily the case above) and we are interested in finding a matrix such that . The clear answer would be to compute the inverse of , . The Moore-Penrose pseudoinverse is a generalized notion of an inverse for non-square matrices. In our case we are looking for a

right inverse. We first define a symmetric basis as (all entries are real), and take as . In other words, .

Darve (4) suggests the choice where . Here is the diagonal mass matrix, containing the atomic masses. This can be thought of as a “mass weighted” pseudoinverse. Making this choice actually results in another ABF equation which can also be used, but I prefer to compute in the manner described above. Notice as well how, if the mass matrix was the identity matrix, it is actually equivalent to the pseudoinverse. The reason for choosing the pseudoinverse (SSAGES implements both) is that in the case where a simulation contains virtual sites, as is often the case in many water models, the mass matrix will contain a zero along the diagonal rendering it singular. In either case, the user is free to choose whichever they please.

Will that choice make a difference? I previously mentioned that no matter the choice of , we will always converge to the mean force. This is true, but our choice of affects the rate at which we converge. Choosing it wisely means we converge very rapidly, while a poor results in slow convergence. In the numerical experiments I’ve run, I did not notice a significant difference in performance between the mass-weighted and pure pseudoinverse. One last note before returning the program is that can also be computed using the expression *if and only if * the collective variables are orthogonal. Otherwise, . However, it’s usually not too much trouble to compute it in the manner I’ve previously described, so we just stick with that. Now, back to the program.

- Once is computed, we calculate where is a vector of the atomic momenta.
- The time derivative is calculated using finite difference. This is perfectly fine since the MD engine is taking finite steps in time anyways. So long as the finite difference scheme is of the same or greater order of accuracy, then things should be fine. In SSAGES we choose a second order accurate finite difference scheme so . This means we need to store the two previous dot products and start biasing only after the third iteration, which is fine.
- Subtract the previous mean force estimate at the current point from the time derivative we just computed. This is important since as the simulation proceeds, we need to account for the bias due to the external force we have been applying to the system.
- Sum into our grid at , and increment the number of hits at that grid point.
- Update the finite difference time derivatives.
- Compute the external bias that we are going to sum into the atomic forces as . Here arises another practical implementation issue. Early on, this force estimate will fluctuate wildly and can easily cause the simulation to blow up. So what is suggested (and what we do), is to have a minimum number of hits in each bin such that the force is turned on in a linear ramp. To do this simply divide by instead of just .
- Update the atomic forces and let the MD engine integrate the forces.
- Return to step 1.

And there you have it, a breakdown of ABF’s mechanics. I do recommend you take a look at the source file for our implementation if you’re a bit more interested, because we implement a scalable strategy for multiple walkers that I did not discuss in this post. That wraps things up with respect to the ABF algorithm itself. What still remains however, is the post-processing.

Once a simulation is complete (or you simply run out of CPU time on your cluster) we need to extract the free energy corresponding to the mean force we just estimated. For a one-dimensional surface, this is as simple as integrating the force along the collective variable by calling “cumtrapz(x,F)” in MATLAB or Numpy. However, for multi-dimensional surfaces, things get a bit more complicated. It turns out that due to statistical fluctuations in the force vector field, the path chosen to integrate along affects the resulting free energy surface. In other words, the free energy is no longer a conserving potential! We all know that as a state function, the free energy difference between two points should be independent of the path chosen. Dealing with this issue is the subject of another post entirely – or you can just go on and read the discussion in Darve (4) if you’d like.

I’d like to leave you with a video I made using ABF to estimate the potential of mean force between an NaCl ion pair in water. It just gives you a feel for how the method converges (quickly!) and is nice to watch. I hope I’ve been able to demystify ABF a bit for you. If I haven’t then I apologize. In either case, Enjoy!

1.

Laio A, Parrinello M (2002) Escaping free-energy minima. *Proceedings of the National Academy of Sciences* 99(20):12562–12566. [Source]

2.

Barducci A, Bussi G, Parrinello M (2008) Well-Tempered Metadynamics: A Smoothly Converging and Tunable Free-Energy Method. *Phys Rev Lett* 100(2). doi: 10.1103/physrevlett.100.020603

3.

Darve E, Pohorille A (2001) Calculating free energies using average force. *The Journal of Chemical Physics* 115(20):9169–9183. [Source]

4.

Darve E, Rodríguez-Gómez D, Pohorille A (2008) Adaptive biasing force method for scalar and vector free energy calculations. *The Journal of Chemical Physics* 128(14):144120. [Source]

The post Adaptive Biasing Force appeared first on Hythem Sidky.

]]>The post Diffusion Maps, Part 2 appeared first on Hythem Sidky.

]]>In the previous post, we discussed the general concepts behind manifold learning and some key assumptions in their development. We then looked at the Swiss roll example and showed how PCA, a linear method, was unable to successfully unroll the sheet. Now we will talk about diffusion maps specifically, and see if it does a better job at unrolling the sheet. Most of my discussion below follows this publication (1). Both the original diffusion mapping method proposed by Coifman and Lafon (2) and later developments are a bit more general than what we will be covering. My focus again is on the approach as it pertains to the analysis of trajectories from molecular simulation, and the description below should suffice.

The objective of diffusion mapping, as with other manifold learning algorithms, is to construct the best low-dimensional embedding of high-dimensional input data. Given some measure of similarity between observations which offers a good representation of local connectivity, the diffusion map reconstructs a global representation of the intrinsic manifold. For something like the Swiss roll, similarity can be described using the Euclidean distance. For molecular trajectories, this may be the rotationally and translationally minimized root mean square deviation (RMSD) – we will discuss this in our next post.

Once a similarity metric is chosen, pairwise distances between all observations are computed and thresholded using a Gaussian kernel. Therefore, given a pairwise distance function , we can construct a matrix with elements,

Applying the Gaussian function has the effect of only retaining points that are “close”, a length-scale which is set by . This parameter is sometimes referred to as the “kernel bandwidth”. Recall in the previous post our discussion of the “locally Euclidean” assumption. Here, setting the value of and applying the Gaussian kernel precisely controls the maximum distance along the yet-to-be-discovered manifold which is well captured by our distance measure . Points which are farther away from this distance are deemed to be too distant to meaningfully characterize the manifold, and are discarded. To drive this point home, I’ve plotted below the Gaussian kernel as a function of a few values of .

As you can see, the length at which the value decays is approximately equal to . Now the first question that comes up is: how do we choose ? Conveniently, Coifman et al. proposed a heuristic which is to generate a plot of vs. and take the extent of the linear regime as a good choice for . We will look at how to do this once we tackle the Swiss roll example. This process of selection ensures that each data point is well-connected by a series of “small hops” of in the distance metric. Ideally, each data point or snapshot can be connected to every other snapshot through hops along entries of which will allow diffusion mapping to synthesize a single global approximation of the underlying manifold. If the data is *not* well connected, only isolated embeddings of each distinct region will be produced.

The next step is to normalize each row of the matrix to yield a new matrix, which is a right-stochastic Markov transition matrix. The elements of are then,

By doing this we are basically saying that the process we are trying to describe is Markovian. The last step is to obtain the eigenvectors/values of the matrix. Note that the first eigenpair will be trivial, with eigenvalue and eigenvector . The remaining top eigenvectors represent the slowest diffusive modes of the system, or the principal components of the manifold along which the long-time dynamics of the system evolves. There are a few interesting questions that we have not answered in this brief overview of diffusion maps, such as estimating the intrinsic dimensionality of the manifold. In the case of the Swiss roll we know the sheet is two dimensional, but in general we do not have any *a priori * knowledge of how many dimensions we should consider. I let the answer to these questions emerge naturally from the more complex example we will address in the next post. For now, we will conclude the basic outlining of diffusion maps.

It turns out that a basic implementation of diffusion maps is not very complicated. We will pick up where we left off in the last example where we attempted to use PCA to unroll the sheet without success. Our first step towards diffusion mapping will be to compute the pairwise distances between the points in our data matrix . There’s a convenient function in scikit-learn that we can use:

`d = metrics.euclidean_distances(X)`

Our next task will be figuring out what the optimal choice of . Let’s generate a plot of vs. . Here I will use another built-in function in scikit-learn which applies a Gaussian kernel to the pairwise distances (also called radial basis functions) and vary the value of .

```
# Values of epsilon in base 2 we want to scan.
eps = np.power(2., np.arange(-10.,14.,1))
# Pre-allocate array containing sum(Aij).
Aij = np.zeros(eps.shape)
# Loop through values of epsilon and evaluate matrix sum.
for i in range(len(eps)):
A = metrics.pairwise.rbf_kernel(X, gamma=1./(2.*eps[i]))
Aij[i] = A.sum()
```

We can see how the linear region of the plot above starts to turn around , but to be conservative we take . We will see that in general, from my limited experience, it is better to err on the side of a smaller value. Next, let’s generate Markov matrix , compute the top 3 eigenvectors and plot the second and third (recall the first is trivial).

```
# From the plot above we see that 4 is a good choice.
eps = 4.
# Generate final matrix A, and row normalized matrix M.
A = metrics.pairwise.rbf_kernel(X, gamma=1./(2.*eps))
M = A/A.sum(axis=1, keepdims=True)
# Get the eigenvalues/vectors of M.
# We normalize by the first eigenvector.
W, V = np.linalg.eig(M)
V = V/V[:,0]
```

What?! I bet you expected to see a nicely unwrapped sheet didn’t you? Maybe we chose a poor value of ? Let’s scan through some values of and see what we get.

Something very interesting is going on here. First of all, notice how for small values of the color follows the “arc” perfectly. This means that the arc parameterizes the distance along the sheet quite well. However, it seems that at small values were are effectively getting a one-dimensional arc embedded in two dimensions. Yet, as epsilon increases and the arc starts to spread out into a sheet, it also curls over and overlaps rather than completely unroll. So what gives?! Well, because our Swiss roll is long an narrow (), by the time is on the proper scale to capture the sheet height, we start including points that are too distant such that our local approximation is no longer valid. In other words, for a long sheet there is such a disparity in length scale that diffusion mapping is extracting the slowest mode only (length of the sheet). The other mode, which is the height of the sheet cannot be resolved. I don’t see this as a shortcoming of diffusion mapping, but more a matter of understanding and interpretation. Diffusion mapping did its job, it just doesn’t match our initial expectation – that doesn’t mean it’s wrong.

If we generate a slightly thicker sheet with we get the following diffusion map:

A much nicer sheet indeed. Of course the choice of can be improved but this proves the point I think. This sums up the idea behind diffusion mapping. It’s actually quite powerful as we have seen. And so long as we know what to expect, it can become quite a useful tool to know. I have put up an IPython notebook for the exercises in both this post and the previous which can be found here. Next time we will use diffusion maps to analyze trajectories from molecular simulation and attempt to reproduce results from a paper. I hope you found this useful! Until then…

1.

Ferguson AL, Panagiotopoulos AZ, Kevrekidis IG, Debenedetti PG (2011) Nonlinear dimensionality reduction in molecular simulation: The diffusion map approach. *Chemical Physics Letters* 509(1–3):1–11. [Source]

2.

Coifman RR, Lafon S (2006) Diffusion maps. *Applied and Computational Harmonic Analysis* 21(1):5–30. [Source]

The post Diffusion Maps, Part 2 appeared first on Hythem Sidky.

]]>The post Diffusion Maps, Part 1 appeared first on Hythem Sidky.

]]>Diffusion maps is one of many nonlinear dimensionality reduction techniques which aim to find a meaningful low-dimensional representation of a high-dimensional data set. My interest in diffusion maps in particular is that in recent years it has been applied (1) to the analysis (2) of molecular simulation trajectories with very interesting results (3) – we will specifically work through such an example in a later post. The idea behind dimensionality reduction in general is to take a data set containing points in dimensional space and find the “best” representation, or embedding, of the data in dimensional space where . In other words, if each data point is an dimensional vector, then we are seeking a way to construct a new representation of that same data point using only a dimensional vector which captures the most relevant information. This is why dimensionality reduction has also been termed, “feature extraction”. It is also a type of unsupervised learning where we are not attempting to learn a relationship between input and output variables, but rather find a way to re-represent the same inputs.

Now depending on how one chooses to define “best representation” in dimensions, we get the various different methods that exist. In fact, this is a non-insignificant detail: the assumptions behind this “best” representation can dictate if a method will be effective for our data. One of the earliest methods that arguably sparked the growth of this entire field is known as isometric feature mapping, or *isomap *for short. I highly recommend giving the original paper (4) a read; it’s short and very neat! An important aspect of diffusion maps, isomap and other *nonlinear* dimensionality reduction methods, is that the resulting low dimensional representation is a nonlinear function of the original coordinates. This is in stark contrast to PCA, which finds new coordinates that are linear combinations of the original coordinates. We will see, in a short while, how linear dimensionality reduction can limit our ability to extract useful dimensions.

Central to most all nonlinear dimensionality reduction (also called manifold learning) methods is the so-called *manifold hypothesis*. The idea is that even though the data we have in hand can be very high dimensional (think molecular trajectory with 3N coordinates, or an image with NxM pixels), the possible combinations in that space containing meaningful data is much smaller, and that there is in fact a *manifold* in low dimension along which this data lies. To help illustrate this I’ve included an image below.

If we are interested in digit recognition, and we have a black-and-white image consisting of let’s say, pixels, then there are possible images of that size. Surely however, the number of images that contain meaningful information such as numeric digits (left exhibit) are far fewer than the possible images out there. In fact, it is most likely that the majority of the possible images are noisy garbage (right exhibit). Thus we can conclude that there is some lower dimensional manifold along which images consisting of digits lie, and that variation along this manifold does a good job at characterizing differences in the images we visualize. Finally, in order for this to be the case, the region of meaningful high dimensional space is sparse. This is in essence the manifold hypothesis.

A second important assumption made by most manifold learning methods (we’ve upgraded our terminology – hooray!) is local Euclidean-ness. This means that even though there is some global nonlinear manifold along which the data lies, Euclidean distance is a good measure of distance along that manifold within the local vicinity of a point. We are essentially saying that if we zoom into that manifold really closely it kind of looks like a hyperplane, so distances along it can be measured using the Euclidean norm. Because of this assumption, if we have two data points that are sufficiently “close” (more on closeness later), then we can use this local positional information as a starting point for reconstructing a global nonlinear picture. Think of this another way: if two data points are very close in Euclidian distance, they are probably very close on the manifold, but the opposite is not necessarily true.

A toy benchmark commonly used to test out manifold learning algorithms is known as the “Swiss roll”, which we’ll be playing with moving forward. The Swiss roll is basically a two dimensional sheet rolled up and embedded in three dimensions. A good manifold learning algorithm should be able to “unroll” the sheet in two dimensions. We can generate the sheet programmatically (5) by sampling points according to , where and . This yields an unfolded roll of length and height , where . This is what it looks like in Python:

```
n = 2000 # Number of points to generate.
noise = 0.1 # Amount of noise.
h = 30 * np.random.rand(n, 1); # Height of the sheet.
t = (3*np.pi/2)*(1 + 2*np.random.rand(n, 1)) # Parameter
X = np.hstack((t*np.cos(t), h, t*np.sin(t))) + noise*np.random.rand(n, 3)
```

Here we choose to generate two thousand points, add a bit of noise, and go for a height of . This puts our at , a factor that will turn out to be important later on. Visualizing this as a 3D scatter plot we get the image below.

As we can see, the color indicates the path along the length of the sheet. Let’s highlight a few key points. Each point on the Swiss roll is described using three values , so our data is 3 dimensional. However, though the data is represented in three dimensions, we know that in fact it lies along a two dimensional sheet, our manifold, which is nonlinear and is embedded in a third dimension. We can also see from the figure that the Euclidian distance between a point in dark blue and one in medium green is not a good measure of the true distance along the rolled up sheet. The distance along the true manifold, the Swiss roll in our case, is known as the *geodesic distance*. On the other hand, for points that are very close together, Euclidean distance is approximately equal to the geodesic distance. Thus we’ve seen how the Swiss roll represents an ideal benchmark for dimensionality reduction methods as it satisfies all the necessary assumptions and is easy to generate and visualize.

Before we get into diffusion mapping, let’s look at what PCA gives us. Recall that PCA seeks to find a projection direction of the original data such that the variance in the data is maximized. This represents the first principal component. The next principal component is orthogonal to the first and captures the second largest amount of variance, and so on. In practice, the principal components can be computed as the descending eigenvalue/eigenvector pairs of the covariance matrix, or through singular value decomposition of the data matrix. Let’s perform PCA on the swiss roll and see what we get. We are interested in the first two principal components. The python code below uses scikit-learn.

```
Y = sk.decomposition.PCA(n_components=2).fit_transform(X)
plt.scatter(Y[:, 0], Y[:, 1], c=t)
```

Well, that was easy. What does our plot look like?

Notice how the first and second principal components correspond to approximately the and axes respectively. This is because PCA is only capable of performing linear projections of the data. In that sense, PCA cannot extract relationships that lie along nonlinear manifolds, and as a result convolutes data that should be spread out in low dimensional space.

This is where diffusion maps and other manifold learning methods come into play. Our focus is primarily on diffusion maps for the reasons stated previously. But before we can map diffusion we must first understand the principles behind it, now that we have a better understanding of dimensionality reduction in general. We shall pick this up in a follow-up post. Until next time…

1.

Ferguson AL, Panagiotopoulos AZ, Debenedetti PG, Kevrekidis IG (2010) Systematic determination of order parameters for chain dynamics using diffusion maps. *Proceedings of the National Academy of Sciences* 107(31):13597–13602. [Source]

2.

Ferguson AL, Panagiotopoulos AZ, Debenedetti PG, Kevrekidis IG (2011) Integrating diffusion maps with umbrella sampling: Application to alanine dipeptide. *The Journal of Chemical Physics* 134(13):135103. [Source]

3.

Ferguson AL, Panagiotopoulos AZ, Kevrekidis IG, Debenedetti PG (2011) Nonlinear dimensionality reduction in molecular simulation: The diffusion map approach. *Chemical Physics Letters* 509(1–3):1–11. [Source]

4.

Tenenbaum JB (2000) A Global Geometric Framework for Nonlinear Dimensionality Reduction. *Science* 290(5500):2319–2323. [Source]

5.

Nadler B, Lafon S, Coifman R, Kevrekidis IG (2008) Diffusion Maps – a Probabilistic Interpretation for Spectral Embedding and Clustering Algorithms. *Lecture Notes in Computational Science and Enginee*:238–260. [Source]

The post Diffusion Maps, Part 1 appeared first on Hythem Sidky.

]]>The post LNT 101 appeared first on Hythem Sidky.

]]>If you’re reading this, then welcome to my blog! This is where I share everything from my scattered thoughts to news on my latest publications and conference attendance. Periodically, I will be posting trip reports, including some of my photography, if I get a chance to sneak away and explore a city during work travel. I also plan on regularly writing tutorials in the areas of applied mathematics, molecular simulation, or whatever else I think myself to be marginally qualified to teach. Some will be pedagogical and probably aimed more at those just starting out in a particular subject. However, my intention is to focus a significant effort on reproducing results in published literature, or attempting a proof-of-concept implementations of a proposed algorithm or technique.

/begin rant

I have serious issues with both the clarity and reproducibility of *some* published work in the field of molecular simulation. This isn’t news to anyone in the field, or any other academic specialty really, but it is particularly frustrating because one would **think** that molecular simulations, being conducted on *computers*, lend themselves to being highly and easily reproducible. However, this is sadly not the case. Molecular simulations have gotten complex enough, especially in the analysis of the raw simulation data, that without providing either an extremely thorough description of the analysis or simply providing the analysis scripts themselves as supplementary information, it may be near impossible to obtain the same results claimed in a published work. As a researcher who is trying to follow up on, validate, or build upon that work, it can be a head-banging experience.

What’s equally disheartening is how papers in computational chemistry which describe new algorithms or sampling methods rarely contain a lucid enough description of the algorithm to facilitate easy implementation and more often than not (in fact, I have yet to come across one that has) do not provide a reference implementation. In applied mathematics it has become nearly impossible to publish an algorithm or be taken seriously without *cold hard code*. It’s just too easy to do. The same applies to new or sufficiently complex (or tabulated) forcefield parameters used in a simulation. For the love of everything good in this world, why would you **not **provide a 2Kb zip file as supplementary material to make our lives easier?! I have even come across a paper once where someone provided a “representative” simulation input file which was a sanitized, non-function input. Why not provide an *actual* input script?!

Lest someone accuse me of being lazy, and wanting to be spoon-fed other people’s hard work, I assure you this is not the case. I’ve seen too many (primarily graduate students) go through the same trouble, spending months or sometimes a year trying to reproduce a result. It can really be an unnecessary barrier to scientific progress. I’m not usually a cynic, but I have two reasons in mind as to why the situation is so bad despite it being 2017, with things like IPython and Github which make it stupid easy for people to maintain a reproducible and easily publishable workflow. The first is that the graduate student who finally figured it out after all the sweat and blood is not going to make it easier for the next one. I kind of see that, I guess. The next reason, and in my opinion the primary one, is that researchers are purposefully opaque in their wording and the material they release in order to maintain a competitive advantage in the field. There’s no excuse for this one, especially if you are a publicly funded researcher. I know of certain groups that have developed reputations for not being forthcoming with details which makes their work, while it appears to be great, exceptionally difficult to reproduce. I am not alone in sharing these views, as recent (1) viewpoints have highlighted this issue. However, as always, there are those who hold an opposing view (2) .

/end rant

I did not intend on this post being primarily a rant, and perhaps I will discuss this issue further in a dedicated post – my sincerest apologies. In any case, watch this space if you are interested in what I have to say. Unless it gets out of control, I have enabled comments as a means for you to share your thoughts with me and others out there.

Thanks for listening.

The post LNT 101 appeared first on Hythem Sidky.

]]>