Content of review 1, reviewed on April 11, 2015

The article describes a trial of BRCA testing of USA women of Hispanic descent having suffered breast cancer. This initiative responds to the undertesting of this population versus Caucasians, trying to overcome the difficulties in reaching it and the usually high cost of the test. The authors claim that a newly developed sequencing enzyme allows Next Generation Sequencing (NGS) with fewer false positives than the previous one. They also compile some demographics, the presence of family history of breast cancer and the main pathological characteristics of the patients’ breast tumors.
Some supplementary files have been added in this second submission, but others have disappeared. The paper reads: “Supporting data Title: BAM files of standard and Hi-Q enzyme libraries Abstract: The file contains the .BAM and corresponding .BAI index file for each of the DNA samples sequenced using both standard enzyme and the Hi-Q version.”, but neither .bam, nor .bam.bai files can be found now. Also .abi files are promised but not supplied.

--Major Compulsory Revisions
In this submission coverage values are not clearly described yet: in the first paragraph of Data Description and again in the Results section DNA sequencing of BRCA1 and BRCA2, authors say they obtain “an average of 313-416X coverage”. I guess they mean 316-416X, since 316 is the average amplicon coverage in the Std set if 8 poor performing samples out of 92 are ignored. Until the next sentence they do not mention the exclusion and even then, the high number of excluded samples is not explained. Even the way in which they give an interval of average coverage before explaining any grouping misleads the reader as if the worst performing sample would have an average amplicon coverage of 313x and the best, of 416x, while the truth is that those values are the average of averages within 2 of the 3 analyzed sets. A clearer explanation should be given.
In response to reviewer claims, a paragraph has been added to the discussion giving some explanations regarding poor performing samples. It includes the sentences: “The initial run of 92 samples had 3 that underperformed and needed to be repeated (3%), however these samples performed well on a second run. Our second run of 46 samples allowed the average coverage to increase from 313-361X to 466X.” Again an important inconsistency can be found, as the average coverage calculation has been made (according to the suppl file amplicon_coverage Comm panel OldEnz build1 Cal) discarding the 8 samples under 20,000 reads, but they only mention the 3 repeated samples (the ones yielding less than 30 total reads each). They give a false impression of not-so-bad disproportion. A faithful explanation should be given, or the coverage calculations should include 89 out of 92 samples instead of the best 84.
Moreover, in the Methods section, still the sentence “Each run produced over 10 Gb of sequence data and each sample had an average depth of coverage surpassing 500X.” can be found. Please, change it for real coverage data.
-Background, last paragraph: The authors state “A clinical diagnostic lab validation of the Ion Torrent platform demonstrated an absence of false negatives and a 10% false positive rate [33].” I cannot deny that they demonstrated the absence of false negatives in their series, but I must say that they (in paper [33]) did not test for any 1-bp deletion or insertion in any homopolymer longer than 3, so they really did not use any real-world difficult mutation. Then authors say “…we sought to validate this approach on a cohort of Hispanic/Latino breast cancer patients”. As the present work do not use known-mutation samples, they cannot claim it validates the approach.
In this second submission still NGS data analysis methods are scarcely explained and many supposedly provided data is lacking:
-Data description:
In the second paragraph the use of a modified GATK variant caller is mentioned, but neither the code nor an URL to access such program are provided.
This second paragraph also reads “Parameter files for TSVC are given, as well the raw and annotated variant files.” The methods section again reads “…All other parameters such as minimum coverage, minimum alternative allele frequency, and strand bias were the same between the two settings (parameter files attached).” But the only parameter file I’ve found in the suppl. files is tmap_mapall_stage1_map4.txt and I would say those are the parameters for the mapping software, not the variant caller.
Also says “Variants were manually examined in the Integrated Genome Viewer (IGV) and screen shots are provided”. There are no screenshots in the suppl. files, only Figure 2 plus Figure 3 display 4 selected screenshots showing 3 variants and a homopolymeric region.
In Data description it is stated that 2 VCs have been used, but in the section DNA sequencing of BRCA1 and BRCA2 it is written “Variants were predicted using the Torrent Variant Caller”. Where has the GATK VC been used? Is the VC list a composite of the lists produced by both VCs? If true, how have they been combined?
-Analyses:
The authors state in one of the answers to reviewer 1 “A revised Table 4 gives the RARE indels and SNVs from both protocols and the sequence context of all homopolymers” and Table 4 title in the manuscript is “Table 4. Hi-Q vs. Standard Enzyme Variant Call Comparison”. RARE indels could suggest that many other artifacts have been called by any of the VCs but have been discarded for instance, for being present in all samples. If this is so, it should be explained, as it could cause losing some real variants because the same position can produce FPs. It should be made clear if the table lists all indels called, or just a selection. If it is only a partial list, the rationale for the selection should be stated in the manuscript.
The last paragraph reads “The Hi-Q enzyme eliminated nearly all of the 1bp deletion alleles (Table 4)”. It is an strange asseveration, as no deletion standard enzyme artifact is shown in table 4.
-Discussion:
The last but one paragraph reads “We documented that the Hi-Q enzyme can achieve a significantly higher accuracy in sequencing through mono-nucleotide repeats. When combined with methods or specific assays for the most prevalent large deletions, a high percentage of germline mutations can be identified. A recent study with the same BRCA1/2 panel was tested in a diagnostics lab with high accuracy [41].”
I do not think that they can claim to have documented higher or lower accuracy in the enzyme comparison, as they have worked with samples not screened with any alternative method. They can have some idea of the specificity of VC with each enzyme when they Sanger sequence the positives, but they have no idea of the false negatives. Since accuracy calculation includes the number of FNs, such assertion cannot be made.
[41] citation (Trujillano et al) has in my view not much more value than [33] (Costa et al), as Trujillano’s article only tests one difficult mutation (although it is an insertion) in an HP, a A8>A9, which they find, but no other 1bp deletion or insertion is located in an HP larger than 4bp.
My point is that, to my knowledge, until now there has not been published any article that can demonstrate good accuracy of Ion Torrent calling 1-bp insertions or deletions in HPs of 6 or longer, and this kind of mutations (most of them pathogenic) are unfortunately quite common in BRCA genes. And this paper does not either. For this reason I would not recommend diagnostics BRCA testing with Ion Torrent (nor with Roche pyrosequencing, although it seems to perform a little better) if long HP region analysis is not complemented by some other technique, like amplicon length analysis.
Table 3 presents variant classification for all the non synonymous variants found. Authors mention in the footnote “All rare non-synonymous and coding region insertions and deletions were classified from a combined analysis of data from ClinVar, the Breast Cancer Information Core (BIC), and conservation .” Since variant classification is a complex issue, some more details on the rules used for such classification should be given.

--Minor Essential Revisions
The fifth paragraph of the discussion includes “(Boland)”. Is this a citation of the bibliography article 36?
Table 3 footnote: “aC1787S/G1788D occur in cis in the same subject;” is written twice.
Figures 2 and 3 are shown for comparison of alignments from the 2 enzymes, but the quality of images do not allow to read sample names, so without the help of more explicit figure legends it is impossible to know which pane correspond to which enzyme.

--Discretionary Revisions
Methods:
Authors answered to reviewer 1 : “All variants called by the variant calling pipeline that were less than 5% in 1000genomes were examined manually in IGV. Variants were discarded that had quality scores under 40 or that were obvious artefacts (variants only at the end of reads or on one strand or variants present to some extent in all samples).” I’d appreciate the inclusion of such or a similar explanation detailing manual examination in the manuscript.
It is noteworthy that Table 4 shows no artifact called by both enzyme data, and all listed real variants were called by both. So, leaving the possibility that there could exist some FNs for both enzymes, it seems that the most sensitive and specific approach could be sequencing with both enzymes and use the intersection VC list as the real one. This would be more expensive, but seems an option to try to overcome the limitations of Ion Torrent chemistry. Perhaps authors could comment on that in the article.

Level of interest An article whose findings are important to those with closely related research interests
Quality of written English Acceptable
Statistical review No, the manuscript does not need to be seen by a statistician.
Declaration of competing interests I declare that I have no competing interests.

Authors' response to reviews: (http://www.gigasciencejournal.com/imedia/8335038381756767_comment.pdf)

 


The reviewed version of the manuscript can be seen here:
http://www.gigasciencejournal.com/imedia/4201515731624203_manuscript.pdf
All revised versions are also available:
Draft - http://www.gigasciencejournal.com/imedia/4201515731624203_manuscript.pdf

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

Content of review 2, reviewed on June 27, 2015

Major Compulsory Revisions
The paper reads: “Supporting data Title: BAM files of standard and Hi-Q enzyme libraries Abstract: The file contains the .BAM and corresponding .BAI index file for each of the DNA samples sequenced using both standard enzyme and the Hi-Q version.”, but neither .bam, nor .bam.bai files can be found now. Also .abi files are promised but not supplied. I hope those files will be supplied in the end, because at least, they could not be found in this and the latter submission.
Coverage issues are now more clearly explained. The other major compulsory revisions have also been addressed satisfactorily. New table 5 and supplementary table constitute a fine revision of the literature regarding validations of BRCA testing with Ion Torrent.
Variant calling with two pipelines is now more clearly described. The authors explain to the reviewer: “For SNPs the two VCs give virtually identical results, each calling one intronic SNP the other missed (both present on manual inspection). GATK is known to be ineffective at calling indels on the Ion Torrent platform.” This clarification could be of great help to the readers too, I suggest to include it in the text.
Minor Essential Revisions
The minor essential revisions have been also met satisfactorily
Discretionary Revisions
The discretionary revisions have been solved too.
I’ve found just one typo:
-Table 2 footnote reads: “*Significantly different, P>0.05”. I guess it should be P<0.05.
Level of interest An article of importance in its field
Quality of written English Acceptable
Statistical review No, the manuscript does not need to be seen by a statistician.
Declaration of competing interests I declare that I have no competing interests.

Authors' response to reviews: (http://www.gigasciencejournal.com/imedia/1966487374180331_comment.pdf)


The reviewed version of the manuscript can be seen here:
http://www.gigasciencejournal.com/imedia/3112454651756807_manuscript.pdf
All revised versions are also available:
First revision - http://www.gigasciencejournal.com/imedia/3112454651756807_manuscript.pdf

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

References

    Michael, D., Joseph, B., Meredith, Y., M., I. K., Lisa, G., Maria, R., Mylen, P., Jason, M., David, R., Kristine, J., Jung, L. H., Rebecca, E., Julie, S., Sara, B., Xijun, Z., Vivian, R., Celia, H., Claudia, B., Edna, R., Candy, A., A., F. J., D., N. D., Zeina, N. 2015. Addressing health disparities in Hispanic breast cancer: accurate and inexpensive sequencing of BRCA1 and BRCA2. GigaScience.