Reproducible analysis in the MyConnectome project

We have released code and data with papers in the past, but this is the first paper I have ever published that attempts to include a fully reproducible snapshot of the statistical analyses. I learned a number of lessons in the process of doing this:

  1. The development of a reproducible workflow saved me from publishing a paper with demonstrably irreproducible results, due to the OS-specific software bug mentioned above. This in itself makes the entire process worthwhile from my standpoint.

  2. Converting a standard workflow to a fully reproducible workflow is difficult. It took many hours of work beyond the standard analyses in order to develop a working VM with all of the analyses automatically run; that doesn’t even count the time that went into developing the browser. Had I started the work within a virtual machine from the beginning, it would have been much easier, but still would require extra work beyond that needed for the basic analyses.

  3. Ensuring longevity of a working pipeline is even harder. The week before the paper was set to published I tried a fresh install of the VM to make sure it was still working. It wasn’t. The problem was simple (miniconda had changed the name of its installation directory), and highlighted a significant flaw in our strategy, which was that we had not specified software versions in our VM provisioning. I hope that we can add that in the future, but for now, we have to keep our eyes out for the disruptive effects of software updates.