Content of review 1, reviewed on February 03, 2012

  1. Birney and colleagues here present an opinion piece presenting both the problem of, and a solution for, the storage of ultra high throughput sequencing data in global repositories. They review the problem, identify concepts from other fields that may assist in resolution, and provide a 'straw-man' proposal for implementation.

  2. This MS is both interesting and well written - if it meant to stimulate debate it certainly will: I found myself formulating paragraphs of counter arguments and what-ifs to explore the ideas presented. These will (perhaps) be part of the debate the opinion piece will engender, and dont need to be addressed following review... It is largely ready to go as is, but I would request two variations:

  3. [minor, likely essential] One is a likely omission: the issue of metaanalysis, where data from multiple studies are reanalyzed to derive new (emergent) properties only accessible when large numbers of studies are combined. These metaanalyses are surely one of the reasons global databases exist - reusing old data in new ways either not possible or conceivable when the data were generated. How does the re-doability-cost of an experiment get calculated when the experiment that is envisaged is not 'just' rerunning a single transcriptome experiment on a relatively easily repeated biological treatment (in their schema saved in a lossy manner), but summing data across 100 such experiments? The authors know the need in such metaanalyses of bulk reanalysis using one version of processing software to make sure all inferences derive from the same ground rules - how will the lossy-stored data meet these needs?

  4. [discretionary] The other is a picky issue perhaps: the authors introduce a specific experiment (English Channel station ...) in the text to probably represent a class of experiments that cannot be repeated for reasons of the arrow of time. Choosing this example assumes specific knowledge of this individual station and its importance, and thus breaks the 'general' or 'universal' flow of the MS. Id suggest they rephrase this short section to make a more generally applicable point about time-stamped environmental samples.

Level of interest: An article of importance in its field

Quality of written English: Acceptable

Statistical review: No, the manuscript does not need to be seen by a statistician.

Declaration of competing interests: I know the authors through meetings and conferences. I do not collaborate with the authors. I have no competing interests.

Source

    © 2012 the Reviewer (CC-BY 4.0 - source).

References

    Guy, C., E., C. C., Ewan, B. 2012. The future of DNA sequence archiving. GigaScience.