Content of review 1, reviewed on November 03, 2014

Dr. Chang and colleagues present an application note where they describe the key features of PLINK 1.9. PLINK 1.9 is a major improvement of previous PLINK 1.0 and 1.07. I have found the article well written and important for the community of genetic data analysts.

Minor essential revisions

1) The Authors should define better the stage of PLINK 2.0 implementation. As far as I understand, PLINK 2.0 is still a project, not yet a result. I find it not appropriate to mention it in the "findings" section of the abstract - it should go into the conclusions in the sense that this is further work. In the last section of the manuscript, entitled "availability and requirements", the reference is for PLINK 2.0 and not for PLINK 1.9. Given the paper mainly refers to PLINK 1.9, why did the Authors point towards PLINK 2.0 only? I find this quite confusing.

2) The example given in the "Bitwise parallelism" section, is quite unclear. What "increment IBS0, IBS1, and IBS2" means is unclear. Also, IBS0-1-2 are not defined. What is a missing call is not defined. What ranges for the variation of i, j, and k should be defined, as well as the possible value of K. Also, what a 960-marker block is is unclear for people not familiar with earlier versions of PLINK.

3) In the "bit population count" section, page 3, is the correlation example referred to a SNP pair or is it more broadly thought as a way to assess the LD structure in the region? My question arises from the sentence "these values can be precomputed since they do not vary between marker pairs" reported in parenthesis. Why is there the need to precompute r values? The Authors should describe better the application framework.

4) In the related section, page 6, please put HWE and Fisher's exact test into context. While HWE test recall a specific application to assess genotype selection, a Fisher's exact test is a much broader concept that can be applied to any context of categorical data analysis. To which context are the Authors referring to here? Minor: notice that is it now Hardy Weinberg test, but Hardy Weinberg Equilibrium test.

5) Section "performance comparisons" is very confusing. The Authors should first introduce their idea to compare the performance of different versions of the software (or also different software) across a set of machines and across different datasets. List the machines separately from the datasets. This should be presented as a kind of "methods". Then, results are correctly listed with appropriate paragraphs.

6) Tables 1..6. Time units must be reported in every table. In addition, tables would benefit of a bit more explanation in the title of what is being tested - please consider a larger readership than that already using PLINK.

7) The manuscript contains a large amount of jargon proper of the statistical genetics community, that I'm not sure it is appropriate for a large readership. For example: in the abstract and in the text, what do "probabilistic calls" refer to might be unclear.

8) First lines of "other noteworthy algorithms": please spend half of a line to give a bit more background - e.g.: a weighted distance matrix "between two individuals"? Or something similar.

9) A lot of shortcuts are never introduced. Examples: IBD, LD, w.r.t, cdf, haploblock.

10) The Authors should avoid overemphasize some of the contents. I suggest avoiding terms such as "the most notable", "exceptionally well", "embarrassingly parallelel". Stick to the scientific evidence.

Discretionary revisions

1) Many comments are reported in parenthesis throughout the manuscript. I would suggest to keep the comments but remove parentheses, to make the reading more fluid. Level of interest An article whose findings are important to those with closely related research interests Quality of written English Acceptable Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests I declare that I have no competing interests

Source

    © 2014 the Reviewer (CC BY 4.0 - source).

References

    C., C. C., C., C. C., M., T. L. C. A., Shashaank, V., M., P. S., J., L. J. 2015. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience.