Content of review 1, reviewed on June 19, 2014

The manuscript "Rapid Detection of Structural Variation in a Human Genome Using Genome Mapping Technology" by Cao et. al describes data obtained by single-molecule genome mapping using cutting edge technology recently introduced by Bionano-genomics Inc. Genome mapping appears to be a novel and highly sensitive approach for SV detection. While sequencing and array based SV detection involves tedious multiple experiments and in many cases requires prior knowledge regarding the presence, location and nature of SVs, the genome mapping approach may provide a comprehensive, genome-wide characterization of SVs in a single experiment. Moreover, some of the detected SVs could not be retrieved from existing data bases or by targeted paired-end or fosmid library sequencing, indicating that the technique may be more powerful in SV detection than other existing methods. Finally, since the information is assembled de-novo from extremely long “reads” it can highlight repetitive elements and viral integration sites even without a reference.

Although the information in this paper is very interesting and potentially ground breaking, the introduction, results and data analysis, methods, discussion and references should be all improved before the manuscript would be suitable for publication. One representative example of how the current work is not thorough enough is the authors claim for the first genomic map of a complete Asian genome. An Asian genome sequence was already published in 2008 (Wang et al. Nature, 456), however this work was not mentioned in the manuscript, and the data from genomic mapping was not compared to the known Asian sequence.

Following is a list of our main concerns regarding the manuscript:

Major Compulsory Revisions:
  1. In its current format, the paper does not provide the information that is suggested in its title. It is required that the manuscript and/or the title will be changed to fit each other. The title suggests that 'rapid detection' was used. However, there is no information regarding the time it took to achieve the data that is described in the manuscript. How long did the following steps take: sample preparation, data acquisition, data analysis and assembly? Discussion of the time and complexity to achieve the same data using other methods is required in order to emphasis why this is 'rapid detection'.

  2. The resolution of the method is not discussed. Short discussion regarding the resolution of the method is required. More specifically, it was defined that a structural variant is a region of 1 kb and larger in size. What are the limitations of the current method to detect structural variants, what is the smallest SV that can be detected?

  3. In 2008 the genome sequence of an Asian individual was published. "The diploid genome sequence of an Asian individual", Wang et al. Nature, 456, 60-65 (2008). This paper is highly relevant for the current manuscript. Beyond addition to the references list we believe that the data should be directly compared also to this published Asian genome sequence. Wang et al. discuss the SV of the Asian genome, this information should be acknowledged in the current manuscript and compared to it.

  4. Structural variation analysis – The data presented is based on the mapping of a single individual (YH). The text mentions that multiple genomes have been mapped (in order to study the frequency of a 2.5 kb repeat) and it would be helpful to discuss the degree of consistency in the SV size distribution between experiments. Currently,it is not clear to what degree the SVs reported are a property of the specific genome or a result of the detection method.

  5. "…Some of the genes like ELMO1, HECW1…are reported associating to diseases…" references are missing and should be added in the text and/or supplementary table 2.

  6. More discussion regarding the enrichment of insertions relative to hg19 is needed. Is it suspected as a method bias effect or is it a specific character of the YH genome?

  7. In light of the claim that SVs appear in functional gene regions a discussion regarding the distribution of detected SVs between coding and non-coding regions is necessary.

  8. Highly repetitive regions –some more information is required regarding the 2.5 kb repeats – is it known what is its sequence? Are all the copies (e.g. 691 in the male sample) on the same locus or, are they in different chromosomes? Any explanation regarding the high copy number of the repeats in male samples (are they mostly in the Y chromosome?)?

  9. Would be very convincing to add a figure that shows a long tandem repeat with flanking regions that allow mapping to the reference.

  10. Complex region analysis – Only abbreviations of the genomic regions are mentioned. Please write full names and add references. We recommend adding a table with all the relevant information.

  11. EBV integration detection – This is not the first time in which the integrated EBV genome was visualized as intact long molecules, please see: Reisinger et al. "Visualization of episomal and integrated Epstein-Barr virus DNA by fiber fluorescence in situ hybridization" Int. J. Cancer, 118 (2006).

  12. '2,174 molecules showed EBV integration events at 2,118 locations along the genome' – this data is very puzzling, if 70X coverage depth was used how come each integration region was represented only once? The argument of population variability between cells from a clonal population is not sufficient to explain these numbers.

  13. Sup fig 9 shows thousands of integration events. How does this relate to the integration events reported in the main section? The figure needs further explanation.

  14. Methods - The methods section is lacking many details especially in light of methodological novelty. How many cells were used, how were they washed and embedded in plugs? What is the lysis buffer that was used? Only YH cells are mentioned in the methods part although results from CEPH-NA12878 are described as well. What are the concentrations used for the labeling procedure? These are only some examples of the information missing from methods section. Full descriptions of the methods (or citing appropriate references) are required for publication.

  15. Data – A description of the deposited data is missing.

Minor Essential Revisions:
  1. We strongly recommend professional style and grammar editing of the text (including figure captions).

  2. Many figures in the supplementary section are of low quality in terms of graphics and text. We recommend redoing these figures.

Level of interest An article of outstanding merit and interest in its field

Quality of written English Needs some language corrections before being published

Statistical review No, the manuscript does not need to be seen by a statistician.

Declaration of competing interests I declare that I have no competing interests

Source

    © 2014 the Reviewer (CC BY 3.0 - source).

Content of review 2, reviewed on September 24, 2014

It would be great to have explanations for the data in the supplementary excel sheet (what is the"confidence" for example?). Otherwise good to go.

Declaration of competing interests None

Source

    © 2014 the Reviewer (CC BY 3.0 - source).

References

    Hongzhi, C., R., H. A., Dandan, C., T., L. E., Yuhui, S., Haodong, H., Xiao, L., Liya, L., Warren, A., Saki, C., Shujia, H., Xin, T., Michael, R., Thomas, A., Anders, K., Huanming, Y., Han, C., Xun, X. 2014. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. GigaScience.