Content of review 1, reviewed on January , 2015

This is a practical short article that notes the importance of reproducibility in neuroimaging and the current limitations of the field, before giving some concrete recommendations for how to improve the situation. As someone who is trying to move to being an 'open data' scientist I found it a very useful resource, even though my data is not in the field of fMRI. 
 
I was asked to comment on how this piece should be classified, whether as a commentary or as a review. I don’t think it readily fits either category, but probably is closer to review than commentary, since it goes beyond critique to give an overview of some methods. The headings are not those in author guidelines but do seem appropriate. 

My comments are mainly minor suggestions to improve the English (as given below), plus a handful of suggestions that would come under 'minor' or 'discretionary revisions' that the authors can be trusted to make. There are no 'major compulsory' revisions. 

1. Paragraph 1 of the Introduction section could possibly be cut. 
2. It might be worth also stating that there are benefits to the researcher themselves in making data available, because it creates a well-documented long-term record of an experiment that has permanence. This was a point that struck me forcibly when I started exploring possibilities for archiving my own data – I had a lot of previous data that was now inaccessible because stored on an outdated format (floppy disks!) or where I'd forgotten what variables were, etc. etc. This point is alluded to on p 4, line 19, but I think could be emphasised in the introduction. 
3. P 2 para 2. "Reproduction is the only way." I'm not sure that is true. I see reproduction as an important way, but not the only one. (And bearing in mind that there are sciences, like astronomy, where replication is seldom possible). Indeed, the authors consider preregistration as a complementary approach at the end of the paper. 
4. The structure of the article is a bit odd. The five easy tips make for an easy-to-follow logic, but the "Tips for more reliable and replicable code" then seems like a bit of an afterthought. I wonder whether it would work better if the latter section were integrated with the prior material, either by adding a further category, or incorporating the material in the five steps. I suggest numbering the five steps as well.
The wording of the 'five steps' oscillates between description and command. Better to stick to one style throughout. 
5. The order of the five steps would work better for me if it went: 1) Sharing protocols; 2) sharing analysis scripts; 3) organising /sharing data; 4) make derived data available and 5) Publishing 
Also this order would group together the two steps that involve sharing of code – may be possible then to incorporate tips for reliable/replicable code here. 
6. Under sharing protocols might want to include sharing of stimuli and advise where this could be done. 
7. Not sure if it is also worth touching on a major reason people put forward for not sharing, which is point ii); often there is a concern that others will use one's hard-won data to do other analyses and grab easy glory. That gets into quite complicated areas about whether and how to restrict secondary usage of data. 
Related issue in p 3 para 3, where describing need to provide 'subject information'. Another major reason people don't deposit data is because of confidentiality issues, often citing restrictions imposed by Ethics Committees (IRBs). Worth stating, perhaps, that this issue should be anticipated when seeking Ethics approval, so that permission is given for anonymised data to be shared. 
8. Reflecting on this, I wonder whether you might add an initial, more generic point to the '5 easy steps' which is 'plan ahead'. The point about things being much easier if formatted appropriately for an archive is key, and should influence earlier stages including how programs are written to generate raw data and results of analyses. 
9. While I think it is worth mentioning preregistration, I did not find this account (on p 7) very clear and wondered whether it should be omitted – maybe just flagging up in the introduction that preregistration is a complementary approach to improving reproducibility, and providing a reference. Preregistration is quite a highly charged topic in this field, with the debate between Kanai and Wagenmakers on Neuroskeptic's blog providing a great deal of interest: http://blogs.discovermagazine.com/neuroskeptic/2014/11/19/reality-check-neuroscience/#.VGzzUWLuBZQ.twitter. Kanai pushing the argument that pre-registration can be a bad thing if it reduces flexibility and hence leads to non-replication. That post raises weighty issues but my advice would be to leave this for another paper, as it merits a more detailed evaluation. 

Line by line comments (Line numbers obtained from Word version with auto line numbers) 
The initial quote from Galton is very nice! 
P 1, line 21 every -> all 
P 2, line 11, delete 'of the' 
P 2, line 15 "Replications are not boring" – seems too subjective a statement 
P 2 line 16; mixed case for pronoun. Suggest one -> we 
P 2, para 2, last sentence; I think could be omitted 
P 2, line 23 majority -> part 
P 2 line 26, use -> used 
P 2, line 28, alone -> be sufficient to 
P 2 , line 31, better -> improve 
P 2, line 35, well -> readily 
P 2, line 37, extend -> extent 
P 3, line 7 "such details can determine whether an effect is observed or not." 
P 3, line 8, re-use -> reproduce 
P 3, line 11-12; when describing Repositories the focus is on sharing data, whereas the protocols section is more to do with sharing procedures. 
P 3, line 15, 'their comparison' -> 'a comparison' 
P 3, line 16-17 – not clear what is meant here by 'beyond the significant results'. Do you mean you'd want to check that means, etc are in the same range. That does seem to be a kind of check that is important but which people do surprisingly rarely. 
P 3, line 18; 'aggregate them to' -> 'aggregate them with' (sounds more natural to my English ear). 
P 3, line 18. "Most funders now request that data are made available, and researchers must be prepared to do this and to identify where the data will be archived." 
P 3, line 26, 'by request' -> 'on request'. 
P 3, I'm curious as to the reason why additional datasets have not been added to the fMRI data center 
P 3, line 29, delete 'most' 
P 3, line 31, I think sentence starting 'In addition' is redundant and could be deleted 
P 4, line 22. Reword to "Sandve et al (2013) have a few simple recommendations." 
P 4, line 23, "keep track of every step" 
P 4 , line 24-25 "Most neuroimaging software has so-called batch mode…. or is made from scripts." 
P 4, line 30; "a random number generator" 
P 4, line 38-39. "reuse scripts on new data". Full stop after 'protocol', delete 'or else – and' . New sentence starting 'Similar' 
P 4, line 40 analyses -> analysis 
P 5, line 7, "would allow judging if" -> "makes it possible to judge whether" 
P 5, line 8 not clear what is meant by "and the actual field of view" 
P 5, line 10, 'their limitation' -> limitations 
P 5, line 12, sentence starting "Yet statistical results…" is hard to understand. 
P 5, line 17, "allows judging of" -> "allows one to judge " 
P 5, line 18 this sentence rather unclear; is one judging the likelihood of (the significance and sparsity) or (the likelihood of the significance) and (the likelihood of the sparsity) 
P 5, line 29, "such as" -> "so that" 
P 5, line 32, "allows one to judge the robustness" 
P 5, para 3. This paragraph changes into the imperative, which sounds odd, as previous paragraphs were descriptive 
P 6, line 2, "each analysis step" 
P 6, line 6, "Barriers to code-sharing" 
P 6, line 7, "Researchers are often concerned that their code is too poor" 
P 6, line 11, "make a workflow" (?) 
P 6, line 13, should Python be capitalised? 
P 6, line 16 'better'-> 'improve' 
P 6, line 17, comma after 'scripts' 
P 6, line 25. 'incentives' 
P 6, line 31, delete 'today' and delete comma 
P 6, line 33, 'need however to train others' 
P 6, line 33, "a much more systematic manner than is currently done" 
P 6, line 35, 'are lacking' -> 'lack' 
P 6, line 36, date only in brackets for Piwowar et al 
P 7, line 3 , "associated with" 
P 7, line 3, "is likely to have as many benefits as sharing data or publications" 
P 7, line 12, "its chance to be replicated" – unclear. "The likelihood that it will be replicated" (?) 
P 7, line 15, "Contrary to what has been claimed" – unclear who has claimed this. 
P 7, line 19, "Such practice doesn't have to be always…" again, rather unclear. 



Level of interest An article of importance in its field
Quality of written English Needs some language corrections before being published
Declaration of competing interests I declare that i have no competing interests

 

 


The reviewed version of the manuscript can be seen here:
http://www.gigasciencejournal.com/imedia/1692915561155108_manuscript.pdf
All revised versions are also available:
Draft - http://www.gigasciencejournal.com/imedia/1692915561155108_manuscript.pdf
First revision - http://www.gigasciencejournal.com/imedia/1248589054163796_manuscript.pdf

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

Content of review 2, reviewed on January 03, 2015

This is a practical short article that notes the importance of reproducibility in neuroimaging and the current limitations of the field, before giving some concrete recommendations for how to improve the situation. As someone who is trying to move to being an 'open data' scientist I found it a very useful resource, even though my data is not in the field of fMRI. I I was asked to comment on how this piece should be classified, whether as a commentary or as a review. I don’t think it readily fits either category, but probably is closer to review than commentary, since it goes beyond critique to give an overview of some methods. The headings are not those in author guidelines but do seem appropriate.

My comments are mainly minor suggestions to improve the English (as given below), plus a handful of suggestions that would come under 'minor' or 'discretionary revisions' that the authors can be trusted to make. There are no 'major compulsory' revisions.

  1. Paragraph 1 of the Introduction section could possibly be cut.
  2. It might be worth also stating that there are benefits to the researcher themselves in making data available, because it creates a well-documented long-term record of an experiment that has permanence. This was a point that struck me forcibly when I started exploring possibilities for archiving my own data – I had a lot of previous data that was now inaccessible because stored on an outdated format (floppy disks!) or where I'd forgotten what variables were, etc. etc. This point is alluded to on p 4, line 19, but I think could be emphasised in the introduction.
  3. P 2 para 2. "Reproduction is the only way." I'm not sure that is true. I see reproduction as an important way, but not the only one. (And bearing in mind that there are sciences, like astronomy, where replication is seldom possible). Indeed, the authors consider preregistration as a complementary approach at the end of the paper.
  4. The structure of the article is a bit odd. The five easy tips make for an easy-to-follow logic, but the "Tips for more reliable and replicable code" then seems like a bit of an afterthought. I wonder whether it would work better if the latter section were integrated with the prior material, either by adding a further category, or incorporating the material in the five steps. I suggest numbering the five steps as well. The wording of the 'five steps' oscillates between description and command. Better to stick to one style throughout.
  5. The order of the five steps would work better for me if it went: 1) Sharing protocols; 2) sharing analysis scripts; 3) organising /sharing data; 4) make derived data available and 5) Publishing Also this order would group together the two steps that involve sharing of code – may be possible then to incorporate tips for reliable/replicable code here.
  6. Under sharing protocols might want to include sharing of stimuli and advise where this could be done.
  7. Not sure if it is also worth touching on a major reason people put forward for not sharing, which is point ii); often there is a concern that others will use one's hard-won data to do other analyses and grab easy glory. That gets into quite complicated areas about whether and how to restrict secondary usage of data. Related issue in p 3 para 3, where describing need to provide 'subject information'. Another major reason people don't deposit data is because of confidentiality issues, often citing restrictions imposed by Ethics Committees (IRBs). Worth stating, perhaps, that this issue should be anticipated when seeking Ethics approval, so that permission is given for anonymised data to be shared.
  8. Reflecting on this, I wonder whether you might add an initial, more generic point to the '5 easy steps' which is 'plan ahead'. The point about things being much easier if formatted appropriately for an archive is key, and should influence earlier stages including how programs are written to generate raw data and results of analyses.
  9. While I think it is worth mentioning preregistration, I did not find this account (on p 7) very clear and wondered whether it should be omitted – maybe just flagging up in the introduction that preregistration is a complementary approach to improving reproducibility, and providing a reference. Preregistration is quite a highly charged topic in this field, with the debate between Kanai and Wagenmakers on Neuroskeptic's blog providing a great deal of interest: http://blogs.discovermagazine.com/neuroskeptic/2014/11/19/reality-check-neuroscience/#.VGzzUWLuBZQ.twitter. Kanai pushing the argument that pre-registration can be a bad thing if it reduces flexibility and hence leads to non-replication. That post raises weighty issues but my advice would be to leave this for another paper, as it merits a more detailed evaluation.

Line by line comments (Line numbers obtained from Word version with auto line numbers) The initial quote from Galton is very nice! P 1, line 21 every -> all P 2, line 11, delete 'of the' P 2, line 15 "Replications are not boring" – seems too subjective a statement P 2 line 16; mixed case for pronoun. Suggest one -> we P 2, para 2, last sentence; I think could be omitted P 2, line 23 majority -> part P 2 line 26, use -> used P 2, line 28, alone -> be sufficient to P 2 , line 31, better -> improve P 2, line 35, well -> readily P 2, line 37, extend -> extent P 3, line 7 "such details can determine whether an effect is observed or not." P 3, line 8, re-use -> reproduce P 3, line 11-12; when describing Repositories the focus is on sharing data, whereas the protocols section is more to do with sharing procedures. P 3, line 15, 'their comparison' -> 'a comparison' P 3, line 16-17 – not clear what is meant here by 'beyond the significant results'. Do you mean you'd want to check that means, etc are in the same range. That does seem to be a kind of check that is important but which people do surprisingly rarely. P 3, line 18; 'aggregate them to' -> 'aggregate them with' (sounds more natural to my English ear). P 3, line 18. "Most funders now request that data are made available, and researchers must be prepared to do this and to identify where the data will be archived." P 3, line 26, 'by request' -> 'on request'. P 3, I'm curious as to the reason why additional datasets have not been added to the fMRI data center P 3, line 29, delete 'most' P 3, line 31, I think sentence starting 'In addition' is redundant and could be deleted P 4, line 22. Reword to "Sandve et al (2013) have a few simple recommendations." P 4, line 23, "keep track of every step" P 4 , line 24-25 "Most neuroimaging software has so-called batch mode…. or is made from scripts." P 4, line 30; "a random number generator" P 4, line 38-39. "reuse scripts on new data". Full stop after 'protocol', delete 'or else – and' . New sentence starting 'Similar' P 4, line 40 analyses -> analysis P 5, line 7, "would allow judging if" -> "makes it possible to judge whether" P 5, line 8 not clear what is meant by "and the actual field of view" P 5, line 10, 'their limitation' -> limitations P 5, line 12, sentence starting "Yet statistical results…" is hard to understand. P 5, line 17, "allows judging of" -> "allows one to judge " P 5, line 18 this sentence rather unclear; is one judging the likelihood of (the significance and sparsity) or (the likelihood of the significance) and (the likelihood of the sparsity) P 5, line 29, "such as" -> "so that" P 5, line 32, "allows one to judge the robustness" P 5, para 3. This paragraph changes into the imperative, which sounds odd, as previous paragraphs were descriptive P 6, line 2, "each analysis step" P 6, line 6, "Barriers to code-sharing" P 6, line 7, "Researchers are often concerned that their code is too poor" P 6, line 11, "make a workflow" (?) P 6, line 13, should Python be capitalised? P 6, line 16 'better'-> 'improve' P 6, line 17, comma after 'scripts' P 6, line 25. 'incentives' P 6, line 31, delete 'today' and delete comma P 6, line 33, 'need however to train others' P 6, line 33, "a much more systematic manner than is currently done" P 6, line 35, 'are lacking' -> 'lack' P 6, line 36, date only in brackets for Piwowar et al P 7, line 3 , "associated with" P 7, line 3, "is likely to have as many benefits as sharing data or publications" P 7, line 12, "its chance to be replicated" – unclear. "The likelihood that it will be replicated" (?) P 7, line 15, "Contrary to what has been claimed" – unclear who has claimed this. P 7, line 19, "Such practice doesn't have to be always…" again, rather unclear.

Level of interest An article of importance in its field Quality of written English Needs some language corrections before being published Declaration of competing interests I declare that i have no competing interests

Author response:

http://www.gigasciencejournal.com/imedia/4514699461599493_comment.pdf

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

References

    Cyril, P., Jean-Baptiste, P. 2015. Improving functional magnetic resonance imaging reproducibility. GigaScience.