Reproducible analysis in the MyConnectome project

We have released code and data with papers in the past, but this is the first paper I have ever published that attempts to include a fully reproducible snapshot of the statistical analyses. I learned a number of lessons in the process of doing this:

  1. The development of a reproducible workflow saved me from publishing a paper with demonstrably irreproducible results, due to the OS-specific software bug mentioned above. This in itself makes the entire process worthwhile from my standpoint.

  2. Converting a standard workflow to a fully reproducible workflow is difficult. It took many hours of work beyond the standard analyses in order to develop a working VM with all of the analyses automatically run; that doesn’t even count the time that went into developing the browser. Had I started the work within a virtual machine from the beginning, it would have been much easier, but still would require extra work beyond that needed for the basic analyses.

  3. Ensuring longevity of a working pipeline is even harder. The week before the paper was set to published I tried a fresh install of the VM to make sure it was still working. It wasn’t. The problem was simple (miniconda had changed the name of its installation directory), and highlighted a significant flaw in our strategy, which was that we had not specified software versions in our VM provisioning. I hope that we can add that in the future, but for now, we have to keep our eyes out for the disruptive effects of software updates.

Great post from Michael Frank on improving reproducibility in science

Hard to argue with any of these suggestios.

2. Everything open by default. There is a huge psychological effect to doing all your work knowing that everyone will see all your data, your experimental stimuli, and your analyses. When you're tempted to write sloppy, uncommented code, you think twice. Unprincipled exclusions look even more unprincipled when you have to justify each line of your analysis.** And there are incredible benefits of releasing raw stimuli and data – reanalysis, reuse, and error checking. It can make you feel very exposed to have all your experimental data subject to reanalysis by reviewers or random trolls on the internet. But if there is an obvious, justifiable reanalysis that A) you didn't catch and B) provides evidence against your interpretation, you should be very grateful if someone finds it (and even more so if it's before publication).

The Bayesian Reproducibility Project

Alexander Etz on why we need a better metric for "success" in reproducibility.

Based on these two metrics, the headlines are accurate: Over half of the replications “failed”. But these two reproducibility metrics are either invalid (comparing significance levels across experiments) or very vague (confidence interval agreement). They also only offer binary answers: A replication either “succeeds” or “fails”, and this binary thinking leads to absurd conclusions in some cases like those mentioned above. Is replicability really so black and white? I will explain below how I think we should measure replicability in a Bayesian way, with a continuous measure that can find reasonable answers with replication effects near zero with wide CIs, effects near the original with tight CIs, effects near zero with tight CIs, replication effects that go in the opposite direction, and anything in between.