The Lowdown

If you’re reading this, then welcome to my blog! This is where I share everything from my scattered thoughts to news on my latest publications and conference attendance. Periodically, I will be posting trip reports, including some of my photography, if I get a chance to sneak away and explore a city during work travel. I also plan on regularly writing tutorials in the areas of applied mathematics, molecular simulation, or whatever else I think myself to be marginally qualified to teach. Some will be pedagogical and probably aimed more at those just starting out in a particular subject. However, my intention is to focus a significant effort on reproducing results in published literature, or attempting a proof-of-concept implementations of a proposed algorithm or technique.

/begin rant

I have serious issues with both the clarity and reproducibility of some published work in the field of molecular simulation. This isn’t news to anyone in the field, or any other academic specialty really, but it is particularly frustrating because one would think that molecular simulations, being conducted on computers, lend themselves to being highly and easily reproducible. However, this is sadly not the case. Molecular simulations have gotten complex enough, especially in the analysis of the raw simulation data, that without providing either an extremely thorough description of the analysis or simply providing the analysis scripts themselves as supplementary information, it may be near impossible to obtain the same results claimed in a published work. As a researcher who is trying to follow up on, validate, or build upon that work, it can be a head-banging experience.

What’s equally disheartening is how papers in computational chemistry which describe new algorithms or sampling methods rarely contain a lucid enough description of the algorithm to facilitate easy implementation and more often than not (in fact, I have yet to come across one that has) do not provide a reference implementation. In applied mathematics it has become nearly impossible to publish an algorithm or be taken seriously without cold hard code. It’s just too easy to do. The same applies to new or sufficiently complex (or tabulated) forcefield parameters used in a simulation. For the love of everything good in this world, why would you not provide a 2Kb zip file as supplementary material to make our lives easier?! I have even come across a paper once where someone provided a “representative” simulation input file which was a sanitized, non-function input. Why not provide an actual input script?!

Lest someone accuse me of being lazy, and wanting to be spoon-fed other people’s hard work, I assure you this is not the case. I’ve seen too many (primarily graduate students) go through the same trouble, spending months or sometimes a year trying to reproduce a result. It can really be an unnecessary barrier to scientific progress. I’m not usually a cynic, but I have two reasons in mind as to why the situation is so bad despite it being 2017, with things like IPython and Github which make it stupid easy for people to maintain a reproducible and easily publishable workflow. The first is that the graduate student who finally figured it out after all the sweat and blood is not going to make it easier for the next one. I kind of see that, I guess. The next reason, and in my opinion the primary one, is that researchers are purposefully opaque in their wording and the material they release in order to maintain a competitive advantage in the field. There’s no excuse for this one, especially if you are a publicly funded researcher. I know of certain groups that have developed reputations for not being forthcoming with details which makes their work, while it appears to be great, exceptionally difficult to reproduce. I am not alone in sharing these views, as recent (1) viewpoints have highlighted this issue. However, as always, there are those who hold an opposing view (2) .

/end rant

I did not intend on this post being primarily a rant, and perhaps I will discuss this issue further in a dedicated post – my sincerest apologies. In any case, watch this space if you are interested in what I have to say. Unless it gets out of control, I have enabled comments as a means for you to share your thoughts with me and others out there.

Thanks for listening.

Gezelter JD (2015) Open Source and Open Data Should Be Standard Practices. J Phys Chem Lett 6(7):1168–1169. [Source]
Krylov AI, et al. (2015) What Is the Price of Open-Source Software? J Phys Chem Lett 6(14):2751–2754. [Source]