Quick tips for data management and reproducible research

Good stuff here.

Another scientist should be able to reproduce your entire research pipeline, from data collection to final figures, without having to email you with questions. It sounds intimidating, but it doesn't have to be, and in practice it's usually not that much extra work. You already know all the information they would need, it's largely just a matter of being mindful of how you do things and keeping a record. More selfishly, working in a reproducible way will make your own life easier, especially when you have to come back to a project months or years later.

Database of more than 10,000 3D object scans

Cool idea, public domain.

We have created a dataset of more than ten thousand 3D scans of real objects. To create the dataset, we recruited 70 operators, equipped them with consumer-grade mobile 3D scanning setups, and paid them to scan objects in their environments. The operators scanned objects of their choosing, outside the laboratory and without direct supervision by computer vision professionals. The result is a large and diverse collection of object scans: from shoes, mugs, and toys to grand pianos, construction vehicles, and large outdoor sculptures. We worked with an attorney to ensure that data acquisition did not violate privacy constraints. The acquired data was irrevocably placed in the public domain and is available freely.

Dorothy Bishop on "ghost variables" and why researchers need to understand poker

Good reminder (and a good analogy) from Dorothy Bishop on reporting all variables we test:

Quite simply p-values are only interpretable if you have the full context: if you pull out the 'significant' variables and pretend you did not test the others, you will be fooling yourself - and other people - by mistaking chance fluctuations for genuine effects.

Open data and the social contract of scientific publishing

We owe the effectiveness of the scientic enterprise in large part to the social contract under which sci- entists publish their ndings in such a way that they may be con rmed or refuted and receive credit for their work in return. Because of the limitations of the printed page, data have been largely left out of this arrange- ment. We have grown accustomed to reading papers in which tables, figures, and statistics summarize the underlying data, but the data themselves are unavailable. There are exceptions, such as DNA sequences, for which there exist specialized public repositories that authors are required to use. But the vast majority of data types do not have such repositories.

Movement for fair open access at Cognition

A worthy request.

With you as cosignatory I would like to write the Cognition Editorial Board, asking them to request the 5 components of Fair Open Access publishing listed below. No demands or ultimatums will be issued at this point - only a request. If you agree to this I'll send you exactly one email (no more) that contains a final draft letter and will ask you to give a thumbs up or thumbs down to your participation. In the meantime, I'm encouraging everyone to continue supporting and contributing to Cognition: It's the creation of our collective research dollars, time, and efforts.

Publishing your code can help you find your own errors

Publishing your code forces you to "clean it up", which can help identify errors.

As you can guess from the post title, in the process of cleaning up the code and files for uploading to the OSF, I found a coding bug. (There can't be many more horrible feelings as a scientist than finding a bug in the code for a paper that's already been accepted!) The bug was that when calculating the accuracy across the cross-validation folds, one of the fold's accuracies was omitted. Thankfully, this was a 15-fold cross-validation, so fixing the code so that the mean is calculated over all 15 folds instead of just 14 made only a minuscule difference in the final results, nothing that changed any interpretations.

Dorothy Bishop on her PrePrint Experiences at PeerJ

Short but informative. Preprints seem like a good idea, generally, for scientific discussion.

What are the benefits to you personally of publishing your work as a PeerJ PrePrint prior to any formal peer review process?

It is very useful to get feedback from experts in the field before finalising a manuscript; ultimately, it should save time, because the paper is more likely to have a smooth passage through a formal review process if you have anticipated and responded to major criticisms, and also been pointed to other relevant literature. Having said that, I don’t yet know if our paper will be accepted for publication! However, even if it is not, it has been useful to have the debate about the p-curve method out in the open, and our pre-print allowed us to put our views in the public domain in a permanent, citeable format.

Negotiating with publishers for immediate self-archiving

I'm surprised this was so easy.

After that everything was simple. I logged in to my zenodo.org account and uploaded the author’s copy of the manuscript. As a result, anyone searching for the article on google scholar will find the publisher’s version requesting 34.95 EUR for pdf access, and right next to it a link to exactly the same article freely available via Zenodo. That’s it! Nice and clean!

Great post from Michael Frank on improving reproducibility in science

Hard to argue with any of these suggestios.

2. Everything open by default. There is a huge psychological effect to doing all your work knowing that everyone will see all your data, your experimental stimuli, and your analyses. When you're tempted to write sloppy, uncommented code, you think twice. Unprincipled exclusions look even more unprincipled when you have to justify each line of your analysis.** And there are incredible benefits of releasing raw stimuli and data – reanalysis, reuse, and error checking. It can make you feel very exposed to have all your experimental data subject to reanalysis by reviewers or random trolls on the internet. But if there is an obvious, justifiable reanalysis that A) you didn't catch and B) provides evidence against your interpretation, you should be very grateful if someone finds it (and even more so if it's before publication).