Content of review 1, reviewed on January 28, 2015

I thank the authors for making necessary corrections and addressing the major and minor compulsory revisions noted previously.

There are a few additional comment for this draft. Major Compulsory Revisions 1. Section 3.2 Concordance of Hadoop and HPC results Please provide the number from triplicate runs in Supplementary Information.

  1. Section 3.2 Concordance of Hadoop and HPC results If you are using identical tools and steps but just a different infrastructure, it is not unreasonable to expect identical results. Please explain WHY you see the differences.

  2. Table 2 is very confusing. Either have two separate tables for HPC and Hadoop or if you have them in the same table, have HI, H11 and HPC in adjacent columns, for all of S1-9. It is very inconvenient to compare and contrast numbers in the current table.

  3. Table 3 may bot be comparing apples to apples. I'd imagine that you will want to list the cores and compare the metrics across the nodes. I understand, the communication aspects kicks in, but then you should perhaps have another table for the comparison of number of nodes. The way it is currently shown, it is not very clear what is being accomplished.

  4. A workflow diagram for the exact HPC and Hadoop steps would be very helpful.

Minor Essential Revisions: Section 2.2 Data Preparation Please mention clearly that S1-9 were generated from Dataset II.

Section 2.4 Change: In attempt to make a ... To: In an attempt to make a fair comparison between the two vastly different platforms, ....

Section 3.3 HPC Results Change: Our Hadoop cluster ... To: Our investigation on scaling was based on ur existing limited capacity, but we expect the behavior to not significantly deviate from the scaling shown in figure 4[ref][ref].

Discretionary Revisions 1. Section 4 Discussions Perhaps you could also mention that this analysis was performed on a system under a load of 1 sample. Since there is quite some discussion on communication, it would be interesting to see how the numbers shift when the entire system is simultaneously processing a larger number of samples, say 100 samples queued up. This multi-sample processing is more often a realistic scenario. Also, it may bring forth the strength of Hadoop. Level of interest An article of importance in its field Quality of written English Needs some language corrections before being published Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests I declare that I have no competing interests.

Authors' response to reviewers: (http://www.gigasciencejournal.com/imedia/2898289071650429_comment.pdf)

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

Content of review 2, reviewed on March 11, 2015

I appreciate the efforts the authors have invested in addressing my review comments. I am satisfied with the changes incorporated and rationale provided. I have nothing further to add. Thanks.

Level of interest An article of importance in its field Quality of written English Acceptable Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests I declare that I have no competing interests.

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

References

    Alexey, S., Tore, S., Mikhail, V., Ola, S. 2015. A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data. GigaScience.