Content of review 1, reviewed on December 10, 2012

  1. Resubmission as a commentary.

Construction of infrastructure to handle NGS sequencing data is difficult, and there is a lack of publicly available information on best practice and the state of the art. I believe the information presented in the article would be useful for other organisations who are building IT systems dealing with NGS data.

However, the authors focus on a single IT infrastructure, UPPNEX. I believe that a single infrastructure is to narrow a focus for a review article.

As an example, there are several different parallel storage infrastructural, queuing system and data-management technologies in use in NGS and HPC. A review should cover more than one implementation.

If the authors wish to focus on a single infrastructure, I believe that it would be suitable to resubmit the manuscript as a commentary rather than a review.

  1. scp data flow.

The first step in the UPPNEX data flow is described as a data transfer via scp. scp can have performance issues on wide area or high bandwidth networks. (eg http://www.psc.edu/index.php/hpn-ssh)

I would like to see discussion on the performance achieved with scp, and whether this is seen to be problematic or not.

  1. Queuing system.

The authors point out that in their original queuing system implementation, users were forced to reserve a node when requiring more than 3GB of memory. They say that this problem was eliminated, but fail to provide details. (eg was it a simple bugfix or changing to a more sophisticated scheduling policy?)

  1. Storage bandwidth

The authors detail their storage capacity. However, no mention is made of storage bandwidth, whether this impacts job performance and the implications for future facility growth.

Discretionary Revisions

  1. irods metadata.

The article on the negotiation that occurs between administrators and users to obtain agreement on volumes of data backup and voting for supported applications. Was similar negotiation required for consensus on what metadata should be stored in irods for each file?

Level of interest: An article of importance in its field

Quality of written English: Acceptable

Statistical review: No, the manuscript does not need to be seen by a statistician.

Declaration of competing interests: I declare that I have no competing interests.

Source

    © 2012 the Reviewer (CC-BY 4.0 - source).

Content of review 2, reviewed on March 30, 2013

I am satisfied that the authors have addressed the issues raised in my original report.

There are some minor typographical errors that need correcting.

Fig 1 caption. Last sentence looks like an editing comment and needs removing.

Comment in page 8 para 7 "Another advantage of sharing the cluster with other user groups at UPPMAX": The "(data not shown)" comment is extraneous and can be removed.

Level of interest: An article of importance in its field

Quality of written English: Acceptable

Statistical review: No, the manuscript does not need to be seen by a statistician.

Declaration of competing interests: I declare that I have no competing interests.

Source

    © 2013 the Reviewer (CC-BY 4.0 - source).

References

    Samuel, L., Martin, D., I., O. P., Jonas, H., Ola, S. 2013. Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data. GigaScience.