Content of review 1, reviewed on June 10, 2016

This manuscript describes Illumina sequencing and assembly of the channel catfish draft genome using SOAPdenovo2 software. Gene annotation included TblastN searches using proteins sequences from six fish species, de novo gene prediction using Augustus and Genscan, and RNA-seq using reads from skin and muscle tissue. Repetitive elements were also identified in the genome assembly. Most methodology is sufficiently documented to aid the reader in replicating their experiments except that the information regarding gap-filling is not detailed. The Conclusions state that this is the first high-quality channel catfish genome.

In fact, the first channel catfish genome assembly, named "Coco", was recently published online (). Full disclosure: I am a member of that USDA/Auburn research team and co-author of the manuscript. The Coco assembly utilized genomic DNA from a doubled haploid individual, Illumina and PacBio sequencing, and the assembly appears to be more contiguous than the BGI assembly. The N50 statistics show that the upper 50% of the Coco assembly is made up of fewer (2,839 vs 66,332) and longer (77.2 kb vs 48.5 kb) contigs than the BGI assembly. The N50 scaffold lengths are similar (7.7 vs 7.2 Mb), but 50% of the bases in the Coco assembly are contained in only 31 scaffolds and 98% of the assembled bases are contained in 594 scaffolds. Furthermore, 97% of the Coco assembly was aligned to the 29 channel catfish chromosomes.

It would be useful to have more assembly statistics for a more comprehensive description of the BGI assembly and for direct comparison with the Coco assembly (see their Table 1).

The authors have estimated the channel catfish genome size at 839 Mb. The USDA/Auburn kmer-based estimate is 1 Gb (Supp. Figure 1), which is closer to published estimates of haploid genome content based on flow cytometry (Tiersch et al 1990; Tiersch and Goudie 1993). The authors should explain why their genome size estimate is so much lower than the flow cytometry data. Does this affect the parameters utilized in SOAPdenovo and potentially collapse genomic regions that arise from local duplication?

The abstract states a predicted 275.3 Mb of repetitive sequence but the manuscript does not describe how this number was determined. The authors should clarify whether the total assembly length of 845.4 Mb includes repetitive content.

The BGI assembly predicted 21,556 coding genes whereas the Coco assembly predicted 26,661 coding genes. It is unclear to what extent the difference is due to assembly integrity or whether fewer genes in the BGI assembly are due to a more limited RNAseq dataset (only skin and muscle tissue). Channel catfish EST, cDNA, and RNAseq datasets from a wider variety of tissues and cell types have been available in GenBank for several years and could be useful to determine whether the missing 20% of genes actually exist in the BGI assembly.

In summary, the assembly produced in the current research is not the first high-quality channel catfish genome assembly and most parameters demonstrate it is not as complete as the Coco assembly published in Nature Communications. The Coco assembly will soon be available in GenBank which will permit the authors to perform a head-to-head comparison of the two assemblies to identify unique features of their assembly that justify publication.

Level of interest

Please indicate how interesting you found the manuscript:
An article whose findings are important to those with closely related research interests

Quality of written English

Please indicate the quality of language in the manuscript:
Acceptable

Declaration of competing interests

Please complete a declaration of competing interests, considering the following questions:

1. Have you in the past five years received reimbursements, fees, funding, or salary from an
organisation that may in any way gain or lose financially from the publication of this
manuscript, either now or in the future?
2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
financially from the publication of this manuscript, either now or in the future?
3. Do you hold or are you currently applying for any patents relating to the content of the
manuscript?
4. Have you received reimbursements, fees, funding, or salary from an organization that
holds or has applied for patents relating to the content of the manuscript?
5. Do you have any other financial competing interests?
6. Do you have any non-financial competing interests in relation to this paper?
If you can answer no to all of the above, write 'I declare that I have no competing interests'
below. If your reply is yes to any, please give details below.

I have no financial competing interests.

I agree to the open peer review policy of the journal. I understand that my name will be included
on my report to the authors and, if the manuscript is accepted for publication, my named report
including any attachments I upload will be posted on the website along with the authors'
responses. I agree for my report to be made available under an Open Access Creative Commons
CC-BY license (). I understand that any comments
which I do not wish to be included in my named report can be included as confidential comments
to the editors, which will not be published.

I agree to the open peer review policy of the journal.

Authors' response to reviews: (https://static-content.springer.com/openpeerreview/art%3A10.1186%2Fs13742-016-0142-5/13742_2016_142_AuthorComment_V1.pdf)


Source

    © 2016 the Reviewer (CC BY 4.0 - source).

References

    Xiaohui, C., Liqiang, Z., Chao, B., Pao, X., Ying, Q., Xinxin, Y., Shiyong, Z., Yu, H., Jia, L., Minghua, W., Qin, Q., Xiaohua, Z., Chao, P., Alex, W., Zhifei, Z., Min, W., Ruobo, G., Junmin, X., Qiong, S., Wenji, B. 2016. High-quality genome assembly of channel catfish, Ictalurus punctatus. GigaScience.