Journal

GigaScience

Official partner

About

GigaScience aims to revolutionize data dissemination, organization, understanding, and use. An online open-access open-data journal, we publish 'big-data' studies from the entire spectrum of life and biomedical sciences. To achieve our goals, the journal has a novel publication format: one that links standard manuscript publication with an extensive database (GigaDB) that hosts all associated data, as well as provides data analysis tools through our GigaGalaxy server. Further promoting transparency in the review process, we have open review as standard for all our peer-reviewed papers.

Our scope covers not just 'omic' type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.

Published by
Review policy on Publons
  • Allows reviews to be published
  • Allows reviewers to display the title of the article they reviewed
Reviews

179

Interested in reviewing for this journal?
Editors on Publons
Top reviewers on Publons (Manuscripts reviewed in last 12 months)
Endorsed by

Reviews

  • The authors have adequately and appropriately addressed my concerns, however I would like to see a citation for the statement on page 3 line 57-59. 2D reads from R9 chemistry does not have homopolymer underrepresentation problem, [sic] which is interpreted by a new basecalling algorithm using recurrent neural next work.

    Submitted to
    Reviewed by
    Ongoing discussion
  • Reviewer's report:
    I'm satisfied with the authors response to my comments.

    Are the methods appropriate to the aims of the study, are they well described, and are
    necessary controls included?If not, please specify what is required in your comments to the
    authors.
    Yes

    Are the conclusions adequately supported by the data shown?If not, please explain in your
    comments to the authors.
    Yes

    Does the manuscript adhere to the journal’s guidelines on minimum standards of
    reporting?If not, please specify what is required in your comments to the authors.
    Yes

    Are you able to assess all statistics in the manuscript, including the appropriateness of
    statistical tests used?(If an additional statistical review is recommended, please specify what
    aspects require further assessment in your comments to the editors.)
    There are no statistics in the manuscript.

    Quality of written EnglishPlease indicate the quality of language in the manuscript:
    Acceptable

    Declaration of competing interestsPlease complete a declaration of competing interests, consider
    the following questions:

    1. Have you in the past five years received reimbursements, fees, funding, or salary from an
    organization that may in any way gain or lose financially from the publication of this
    manuscript, either now or in the future?
    2. Do you hold any stocks or shares in an organization that may in any way gain or lose
    financially from the publication of this manuscript, either now or in the future?
    3. Do you hold or are you currently applying for any patents relating to the content of the
    manuscript?
    4. Have you received reimbursements, fees, funding, or salary from an organization that holds
    or has applied for patents relating to the content of the manuscript?
    5. Do you have any other financial competing interests?
    6. Do you have any non-financial competing interests in relation to this manuscript?
    If you can answer no to all of the above, write ‘I declare that I have no competing interests’ below.
    If your reply is yes to any, please give details below.

    I declare that I have no competing interests.

    I agree to the open peer review policy of the journal. I understand that my name will be included
    on my report to the authors and, if the manuscript is accepted for publication, my named report
    including any attachments I upload will be posted on the website along with the authors'
    responses. I agree for my report to be made available under an Open Access Creative Commons
    CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
    which I do not wish to be included in my named report can be included as confidential comments
    to the editors, which will not be published.

    I agree to the open peer review policy of the journal.


    Published in
    Reviewed by
    Ongoing discussion
  • Richard Buggs reviewed the original version of this manuscript, this is their report from 25 July, 2016:

    This paper presents a Ginkgo genome assembly from 190x coverage with Illumina reads. It is assembled into 10.61GB of sequence, with a scaffold N50 of 1.36 Mp. No assembly is perfect, of course, but this seems to be a respectable, useful genome assembly and worthy of publication. However, the transcriptomic experiment and the sections on sex determination are not worthy of publication.

    It is good that all PE libraries and one MP library were made from a single haploid endosperm. However, some heterozygosity is introduced by the other MP libraries that were made from other seeds from the same plant. Was this heterozygosity taken into account when assembling the genome? Could some of the apparently tandemly duplicated genes in fact be under-assembly of heterozygous alleles?

    I am concerned about the claims in this paper about repeats in the Ginkgo genome (which form a significant part of this paper). Repeats are notoriously hard to assemble, and one cannot realistically expect them all to be correctly assembled in a de novo genome assembly. Therefore, caution is needed when making claims like "76.5% of the genome is made up of repeats". It may be that 76.5% of the assembly is made up of repeats, but the true repeat content of the actual genome cannot be measured so easily. It would be nice to see a measure of repeat content that is based on frequency analysis of the raw reads, in addition to one based on the genome assembly, for comparative purposes.

    The authors devote a large part of the MS to differential expression between the sexes, based on two pools of RNA from (1) female organs from three trees and (2) male organs from four trees. Because the RNA from trees has been pooled into one female sample and one male sample, the study has n=2 and does not have statistical power to test for differences in gene expression between male and female. The conclusions that can be drawn from this study are therefore very few. Supplementary table 11 seems to me to be particularly weak, looking at differential expression of defence genes among male and female. Given the statistical concerns, and the fact that RNA was only collected from floral tissues, and defence responses must occur in all tissues, this seems a particularly poorly supported set of conclusions. I suggest that these analyses, and associated text, are removed from the MS.

    The interaction network in Figure 3a is also questionable. It is not based on co-expression date from Ginkgo, but upon the gene interaction network of Arabidopsis, simply overlaid with BLAST hits from Ginkgo. It seems to assume that BLAST hits in the two species have similar functions, and share the same interaction networks, both of which assumptions are controversial. I suggest that this is removed from the MS.

    The section on "candidate sex genes" in the discussion and in the results is not worthy of publication. It is mainly based on the n=2 transcriptomic study, and questionable and rather vague comparisons with angiosperms whose sex determination systems have been characterised. I have no idea why sex chromosomes are in the discussion as I can see no evidence from the data presented here that Ginkgo have sex chromosomes. Much of the "Results" section on sex determining genes is actually discussion, and makes lots of reference to the literature for various genes, outlining what they do in other species. This all needs to be removed from the results section, and probably from the manuscript, given that the transcriptomic experiment drawing our attention to these genes in Ginkgo is itself so weak.

    The 10GB genome size of Ginkgo is smaller than the genome size of many conifers, so I would recommend dropping the use of "giga-" and "enormous" in the paper in reference to the Ginkgo genome.

    Abstract: "unusual resistance/tolerance" to what? This needs to be specificed.

    What do the authors mean by "high-resolution genome" in the abstract? All genome sequences are at the same level of resolution: the base-pair level. Genomes differ in their quality according to there contiguity and accuracy - they do not differ in their resolution.

    The paper often uses comparative terms without specifiying what the comparison is with: "the longest introns" of what? All life? All plants? "Most insertions" compared to what?

    There are several language errors throughout the MS that need correcting, including the use of "at last". Are the methods appropriate to the aims of the study, are they well described, and are necessary controls included? If not, please specify what is required in your comments to the authors. No.

    Are the conclusions adequately supported by the data shown? If not, please explain in your comments to the authors. No.

    Does the manuscript adhere to the journal’s guidelines on minimum standards of reporting? If not, please specify what is required in your comments to the authors. Yes.

    Are you able to assess all statistics in the manuscript, including the appropriateness of statistical tests used? Yes, and I have assessed the statistics in my report.

    Quality of written English Please indicate the quality of language in the manuscript: Not suitable for publication unless extensively edited

    Declaration of competing interests Please complete a declaration of competing interests, consider the following questions: Have you in the past five years received reimbursements, fees, funding, or salary from an organization that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold any stocks or shares in an organization that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold or are you currently applying for any patents relating to the content of the manuscript? Have you received reimbursements, fees, funding, or salary from an organization that holds or has applied for patents relating to the content of the manuscript? Do you have any other financial competing interests? Do you have any non-financial competing interests in relation to this manuscript? If you can answer no to all of the above, write ‘I declare that I have no competing interests’ below. If your reply is yes to any, please give details below. I declare that I have no competing interests.

    I agree to the open peer review policy of the journal. I understand that my name will be included on my report to the authors and, if the manuscript is accepted for publication, my named report including any attachments I upload will be posted on the website along with the authors' responses. I agree for my report to be made available under an Open Access Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments which I do not wish to be included in my named report can be included as confidential comments to the editors, which will not be published. I agree to the open peer review policy of the journal.

    REVIEW 1, REVIEWED ON SEPTEMBER 30, 2016

    I am content with the authors' responses to my reviews.

    Are the methods appropriate to the aims of the study, are they well described, and are necessary controls included?If not, please specify what is required in your comments to the authors. Yes.

    Are the conclusions adequately supported by the data shown?If not, please explain in your comments to the authors. Yes.

    Does the manuscript adhere to the journal’s guidelines on minimum standards of reporting?If not, please specify what is required in your comments to the authors. Yes.

    Are you able to assess all statistics in the manuscript, including the appropriateness of statistical tests used?(If an additional statistical review is recommended, please specify what aspects require further assessment in your comments to the editors.)

    No, I do not feel adequately qualified to assess the statistics.

    Quality of written EnglishPlease indicate the quality of language in the manuscript:
    Needs some language corrections before being published.

    Declaration of competing interestsPlease complete a declaration of competing interests, consider
    the following questions:

    1. Have you in the past five years received reimbursements, fees, funding, or salary from an
    organization that may in any way gain or lose financially from the publication of this
    manuscript, either now or in the future?
    2. Do you hold any stocks or shares in an organization that may in any way gain or lose
    financially from the publication of this manuscript, either now or in the future?
    3. Do you hold or are you currently applying for any patents relating to the content of the
    manuscript?
    4. Have you received reimbursements, fees, funding, or salary from an organization that holds
    or has applied for patents relating to the content of the manuscript?
    5. Do you have any other financial competing interests?
    6. Do you have any non-financial competing interests in relation to this manuscript?
    If you can answer no to all of the above, write ‘I declare that I have no competing interests’ below.
    If your reply is yes to any, please give details below.

    I declare that I have no competing interests.

    I agree to the open peer review policy of the journal. I understand that my name will be included
    on my report to the authors and, if the manuscript is accepted for publication, my named report
    including any attachments I upload will be posted on the website along with the authors'
    responses. I agree for my report to be made available under an Open Access Creative Commons
    CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
    which I do not wish to be included in my named report can be included as confidential comments
    to the editors, which will not be published.

    I agree to the open peer review policy of the journal.

    Authors' response to reviews: (https://static-content.springer.com/openpeerreview/art%3A10.1186%2Fs13742-016-0154-1/13742_2016_154_AuthorComment_V1.pdf)


    Published in
    Reviewed by
    Ongoing discussion
  • Reviewer #1 reported problems with using the Clusterflock tool due to the complexity with installing the software and its dependencies. In response, the authors of Clusterflock have provided a Docker container which ships all of the code and associated software libraries in a standalone package ready for use.

    I have tested the clusterflock-0.1 Docker container and can report that I have successfully executed the clusterflock.pl and clusterflock_simulations.pl scripts to completion using the instructions available from https://github.com/narechan/clusterflock/blob/master/MANUAL. This involved:

    1. Deploying an Ubuntu-14.04 EC2 virtual server as a t2.medium instance on the AWS cloud and installing the Docker software on it.

    2. Downloading the narechan/clusterflock-0.1 Docker image from DockerHub onto the virtual server.

    3. The Clusterflock scripts can then be executed by running the clusterflock-0.1 Docker container with this command on the host server: 
    $ docker run -v /mount/path/on/host:/home/test -it narechan/clusterflock-0.1

    The following two commands can then be executed using clusterflock-0.1 Docker image:

    $ clusterflock.pl -i test_data/4/fastas/ -c config.boids.simulations -l test_data/4/4.lds -s all -b 1 -d -x -o /home/test/4_out

    $ clusterflock_simulations.pl -c config.boids.simulations -r 10 -p 10 -o /home/test/4_sim/ -i test_data/4/fastas/ -l test_data/4/4.lds -j /home/clusterflock/dependencies/elki-bundle- 0.6.5~20141030.jar -k 4 -f 500 > /home/test/4_sim.avg_jaccard

    Both of the above commands generated outputs as described in https://github.com/narechan/clusterflock/blob/master/MANUAL.

    Level of interest
    Please indicate how interesting you found the manuscript:

    An article whose findings are important to those with closely related research interests

    Quality of written English
    Please indicate the quality of language in the manuscript:

    Acceptable

    Declaration of competing interests
    Please complete a declaration of competing interests, considering the following questions:
    1. Have you in the past five years received reimbursements, fees, funding, or salary from an
    organisation that may in any way gain or lose financially from the publication of this
    manuscript, either now or in the future?
    2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
    financially from the publication of this manuscript, either now or in the future?
    3. Do you hold or are you currently applying for any patents relating to the content of the
    manuscript?
    4. Have you received reimbursements, fees, funding, or salary from an organization that
    holds or has applied for patents relating to the content of the manuscript?
    5. Do you have any other financial competing interests?
    6. Do you have any non-financial competing interests in relation to this paper?
    If you can answer no to all of the above, write 'I declare that I have no competing interests'
    below. If your reply is yes to any, please give details below.

    I declare that I have no competing interests.

    I agree to the open peer review policy of the journal. I understand that my name will be included
    on my report to the authors and, if the manuscript is accepted for publication, my named report
    including any attachments I upload will be posted on the website along with the authors'
    responses. I agree for my report to be made available under an Open Access Creative Commons
    CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
    which I do not wish to be included in my named report can be included as confidential comments
    to the editors, which will not be published.

    I agree to the open peer review policy of the journal.


    Published in
    Reviewed by
    Ongoing discussion
  • I am happy with the changes made, addressing all of the points that I raised. The main point I raised about clarifying how the modelling works has been addressed both in the text and with a new additional figure.

    Are the methods appropriate to the aims of the study, are they well described, and are
    necessary controls included?
    If not, please specify what is required in your comments to the authors.

    Yes

    Are the conclusions adequately supported by the data shown?
    If not, please explain in your comments to the authors.

    Yes

    Does the manuscript adhere to the journal’s guidelines on minimum standards of
    reporting?
    If not, please specify what is required in your comments to the authors.

    Yes

    Are you able to assess all statistics in the manuscript, including the appropriateness of
    statistical tests used?
    (If an additional statistical review is recommended, please specify what aspects require further
    assessment in your comments to the editors.)

    Yes, and I have assessed the statistics in my report.

    Quality of written English
    Please indicate the quality of language in the manuscript:

    Acceptable

    Declaration of competing interests

    Please complete a declaration of competing interests, considering the following questions:
    1. Have you in the past five years received reimbursements, fees, funding, or salary from
    an organisation that may in any way gain or lose financially from the publication of this
    manuscript, either now or in the future?
    2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
    financially from the publication of this manuscript, either now or in the future?
    3. Do you hold or are you currently applying for any patents relating to the content of the
    manuscript?
    4. Have you received reimbursements, fees, funding, or salary from an organization that
    holds or has applied for patents relating to the content of the manuscript?
    5. Do you have any other financial competing interests?
    6. Do you have any non-financial competing interests in relation to this paper?
    If you can answer no to all of the above, write 'I declare that I have no competing interests'
    below. If your reply is yes to any, please give details below.

    I declare that I have no competing interests

    I agree to the open peer review policy of the journal. I understand that my name will be included
    on my report to the authors and, if the manuscript is accepted for publication, my named report
    including any attachments I upload will be posted on the website along with the authors'
    responses. I agree for my report to be made available under an Open Access Creative Commons
    CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
    which I do not wish to be included in my named report can be included as confidential comments
    to the editors, which will not be published.

    I agree to the open peer review policy of the journal.

    Authors' response to reviewer: (https://static-content.springer.com/openpeerreview/art%3A10.1186%2Fs13742-016-0149-y/13742_2016_149_AuthorComment_V2.pdf)


    Published in
    Reviewed by
    Ongoing discussion
  • There is a paucity of reptile genomes given the number of species within this vertebrate group, making any additional reptile genome assembly an important additional to the list. The manuscript "Draft genome of the leopard gecko, Eublepaharis macularius" by Xiong et al reports a brief description of the assembly and gene content analysis of the leopard gecko genome. Being able to now compare genomes with atypical gecko features to that of Gekko japonicas will help to uncover the genomic features contributing to the more typical gecko features. The authors have generated a genome assembly and gene build of similar quality to Gekko japonicas and it will be useful for comparative genomic investigations.

    Minor revisions:

    Table S2: Assembly statistics for Pogona vitticeps should be added to this table for completeness [Georges et al (2015) Gigascience 4:45]. The text in the manuscript relating to this table would also need to be updated and should be changed to "Comparison of assembly N50s of leopard gecko and the eleven other published reptile genomes…"

    Line 102: change to "…captured 225 (91%) of the 248…"

    Level of interest

    Please indicate how interesting you found the manuscript:

    An article of importance in its field

    Quality of written English

    Please indicate the quality of language in the manuscript:
    Acceptable.

    Declaration of competing interests

    Please complete a declaration of competing interests, considering the following questions:
    1. Have you in the past five years received reimbursements, fees, funding, or salary from
    an organisation that may in any way gain or lose financially from the publication of this
    manuscript, either now or in the future?
    2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
    financially from the publication of this manuscript, either now or in the future?
    3. Do you hold or are you currently applying for any patents relating to the content of the
    manuscript?
    4. Have you received reimbursements, fees, funding, or salary from an organization that
    holds or has applied for patents relating to the content of the manuscript?
    5. Do you have any other financial competing interests?
    6. Do you have any non-financial competing interests in relation to this paper?

    If you can answer no to all of the above, write 'I declare that I have no competing interests'
    below. If your reply is yes to any, please give details below.

    I declare that I have no competing interests .

    I agree to the open peer review policy of the journal. I understand that my name will be included
    on my report to the authors and, if the manuscript is accepted for publication, my named report
    including any attachments I upload will be posted on the website along with the authors'
    responses. I agree for my report to be made available under an Open Access Creative Commons
    CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
    which I do not wish to be included in my named report can be included as confidential comments
    to the editors, which will not be published.

    I agree to the open peer review policy of the journal.

    Authors' response to reviews: (https://static-content.springer.com/openpeerreview/art%3A10.1186%2Fs13742-016-0151-4/13742_2016_151_AuthorComment_V1.pdf)


    Published in
    Reviewed by
    Ongoing discussion
  • The authors report the generation of high coverage Illumina short read sequence data and genome assembly for the leopard gecko (Eublepharis macularius). As an "eye-lidded" gecko without sticky toe pads, this species represents a unique lineage in geckos and its genome can help answer questions about the evolution of many amazing gecko traits. Also, as reptile genomes are under-sampled with respect to mammals in proportion to species diversity, another gecko genome can help resolve questions about genome evolution in sauropsids. Thus, the project, and all the work that went into it, is justified and should be published. As the current manuscript is a Data Note, there is no true biological question that was addressed using the genome assembly and no results reported from comparative analyses. Whether or not this group of authors intends to actually use the genome assembly remains to be seen, but that is not the point right now.

    For the purposes of this manuscript, the data generation and analyses are mostly well documented and are done according to widespread practices in genome assembly. They generated 136X genome coverage of raw illumina data from a variety of insert sizes, filtered conservatively and wisely, and assembled contigs, scaffolds, and closed gaps. The scaffold N50 is 664Kb which is good for a reptile genome nowadays, although scaffolding practices are greatly improving. In any case, this genome will be useful for future comparative analyses.

    I see no roadblocks to publishing this manuscript with just some very minor revisions.

    I would like to know more about the gene annotation merging process. Was this done with a published tool like MAKER, or with in-house computational pipelines? There should be more transparency about this method.

    Other than that, I wish the Leopard Gecko Genome Group luck going forward (but make sure you share the data!).

    Level of interest

    Please indicate how interesting you found the manuscript:

    An article whose findings are important to those with closely related research interests.

    Quality of written English

    Please indicate the quality of language in the manuscript:
    Acceptable.

    Declaration of competing interests

    Please complete a declaration of competing interests, considering the following questions:
    1. Have you in the past five years received reimbursements, fees, funding, or salary from
    an organisation that may in any way gain or lose financially from the publication of this
    manuscript, either now or in the future?
    2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
    financially from the publication of this manuscript, either now or in the future?
    3. Do you hold or are you currently applying for any patents relating to the content of the
    manuscript?
    4. Have you received reimbursements, fees, funding, or salary from an organization that
    holds or has applied for patents relating to the content of the manuscript?
    5. Do you have any other financial competing interests?
    6. Do you have any non-financial competing interests in relation to this paper?

    If you can answer no to all of the above, write 'I declare that I have no competing interests'
    below. If your reply is yes to any, please give details below.

    I declare that I have no competing interests .

    I agree to the open peer review policy of the journal. I understand that my name will be included
    on my report to the authors and, if the manuscript is accepted for publication, my named report
    including any attachments I upload will be posted on the website along with the authors'
    responses. I agree for my report to be made available under an Open Access Creative Commons
    CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
    which I do not wish to be included in my named report can be included as confidential comments
    to the editors, which will not be published.

    I agree to the open peer review policy of the journal.

    Authors' response to reviews: (https://static-content.springer.com/openpeerreview/art%3A10.1186%2Fs13742-016-0151-4/13742_2016_151_AuthorComment_V1.pdf)


    Published in
    Reviewed by
    Ongoing discussion
  • Xiong et al. present the first draft genome for the leopard gecko, Eublepharis macularius. The leopard gecko has been a model for a number of biological questions, including regeneration (Peacock et al., 2015; Gilbert et al., 2013; Delorme et al., 2012), temperature dependent sex determination (Huang et al., 2013; Huang et al., 2008; Putz & Crews, 2006) and development (Vickaryous & McLean, 2012; Wise et al., 2009). The availability of high quality whole genome assembly and annotation would advance studies in the leopard gecko model.

    The authors selected a male leopard gecko specimen that was the result of over 30 generations of introgression, which would be expected to reduce heterozygosity in genome sequences. Using a standard multi-library Illumina-based sequencing approach, they created an assembly with scaffold N50 of 664 kb and contig N50 of 20 kb, for a total assembly size of 2.02 Gb out of an expected 2.23 Gb. CEGMA and BUSCO analysis identified 91% partial and 85% complete core eukaryotic gene representation in the genome assembly. Transcriptome sequences from liver, salivary gland, scent glad, and skin from NIH NCBI SRA and predictive approaches were used to annotate 24,755 protein-coding genes. All of these statistics are comparable to other recent squamate genome assemblies and annotations such as Gekko japonicus. No items requiring correction were identified.

    Availability of the Eublepharis macularius assembly would be of great value for comparative analysis of vertebrate and reptilian genomes, and subsequent evolutionary genetic analysis may help to identify the differences between eublepharids from other gekkotans.

    Level of interest

    Please indicate how interesting you found the manuscript:
    An article of importance in its field.

    Quality of written English

    Please indicate the quality of language in the manuscript:
    Acceptable.

    Declaration of competing interests

    Please complete a declaration of competing interests, considering the following questions:
    1. Have you in the past five years received reimbursements, fees, funding, or salary from
    an organisation that may in any way gain or lose financially from the publication of this
    manuscript, either now or in the future?
    2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
    financially from the publication of this manuscript, either now or in the future?
    3. Do you hold or are you currently applying for any patents relating to the content of the
    manuscript?
    4. Have you received reimbursements, fees, funding, or salary from an organization that
    holds or has applied for patents relating to the content of the manuscript?
    5. Do you have any other financial competing interests?
    6. Do you have any non-financial competing interests in relation to this paper?

    If you can answer no to all of the above, write 'I declare that I have no competing interests'
    below. If your reply is yes to any, please give details below.

    I declare that I have no competing interests .

    I agree to the open peer review policy of the journal. I understand that my name will be included
    on my report to the authors and, if the manuscript is accepted for publication, my named report
    including any attachments I upload will be posted on the website along with the authors'
    responses. I agree for my report to be made available under an Open Access Creative Commons
    CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
    which I do not wish to be included in my named report can be included as confidential comments
    to the editors, which will not be published.

    I agree to the open peer review policy of the journal.

    Authors' response to reviews: (https://static-content.springer.com/openpeerreview/art%3A10.1186%2Fs13742-016-0151-4/13742_2016_151_AuthorComment_V1.pdf)


    Published in
    Reviewed by
    Ongoing discussion
  • Andrea Zuccolo reviewed the original version of this manuscript, this is their report from 25 July, 2016:

    Rui Guan et al., in this manuscript present the results of the sequencing and characterization of the Ginkgo biloba genome. These data are highly valuable because the genomic characterization of Ginkgo biloba was long overdue and this sequence fills a serious gap in the genomic resources available for comparative studies targeting gymnosperms. Actually, as far as evolutionary analyses are concerned, the importance of this data, goes beyond gymnosperms. As I said the resource provided is valuable and of interest, however in its present form the manuscript suffers from some issues that authors should positively address.

    Here are my major criticisms: -all over the manuscript, including supplemental figures and tables, there is a lack of detail in the description of the work technical aspects, mainly sequencing and assembly (see below for a more detailed list). In general the whole methods section is somehow cursory in the descriptions provided. Authors claim the high contiguity of the assembly as described by the N50 stats for both contigs and scaffolds. These values are indeed remarkable and possibly represent the greatest value of this research. However all the common stats generally defined as "genome assembly forensic" are missing. Authors should provide this information in an ad hoc "genome assembly quality assessment" paragraph. The mapping of EST sequences alone on the assembly clearly does not satisfy this requirement since the information provided is limited to the completeness of the assembly nothing saying about its contiguity. -Authors should also compare Ginkgo assembly stats with those characterizing other genome assembly projects targeting gymnosperms and explain why their strategy was successful in providing contiguity figures that are at least two orders of magnitude larger than those characterizing the most curated assemblies (I am referring to the Version 4 of P. glauca assembly described in Warren et al; 2015 The Plant Journal). -Not all of the valuable resources presented and discussed in this manuscript are publicly available. I specify the missing ones in the following detailed comments.

    Detailed list of comments and criticisms:

    Page 4, lines 14-16. I agree that the data provided could possibly aid a better annotation of other genomes. I am absolutely convinced that they'll be helpful and valuable in evolutionary studies. However I am skeptical about their contribute to the amelioration of other gymnosperm genome assemblies. I say this because the evolutionary distances involved still encompass hundred(s) of million year and so it's difficult to envisage substantial and extended collinearity between Ginkgo and Pinus spp or Picea spp. Page 4 line 30: "analyses of structural variation". Considering the lack of a close sequenced organism for comparison, "structural variation" is definitely a definition misused in this context. Even under the most general and liberal definition, the study of structural variation presented in the manuscript is actually limited at best to the analysis of gene tandem duplications. Because of these reasons this sentence is a sort of an overstatement. Please rewrite it according to my criticisms.

    Page 5, line 24: "10.002 Gbp". A precision reaching the third decimal position is definitely not needed here.

    Page 6, lines 10 and following. Here authors should compare (and then discuss) the amount of predicted genes in Ginkgo with similar predictions in other gymnosperm sequenced genomes.

    Page 6 Line 22: I cannot find an explanation for the acronym HPD. Please add it.

    Page 6 line 28: "5316 orthologous genes". In figure 1 the figure is 5116. Please correct the manuscript (or the figure)

    Page 6 lines 37-49: in presenting the GO enrichment results, authors should go in a more detailed description than the plain listing of the highest level GO terms. For instance, the enrichment of "Biotic stimulus response" terms is an interesting piece of information, however some detail better illustrating which "Bioitic stimulus" is affected would be more useful. The same goes for MF and CC categories.

    Page 6 line 54: "an exceptional proportion". Similar figures have been described in other gymnosperm genomes and in maize too. Because of this I wouldn't call this proportion "exceptional". I think that "remarkable" better depicts the situation here.

    Page 6, Lines 58-59 "include two predominant superfamilies". Not surprisingly because these two LTR-RT superfamilies are the only ones found in plants...and so, since there are no other competitors they cannot be described as "predominant".

    Page 6, lines 56-60. Please recheck the figures proposed in this paragraph because their sum doesn't seem to be OK. Gypsy: 63.5%, Copia 20.4%. Total should be 83.9%, instead 79.37 % is presented.

    Page 7, Line 1: Phylogenetic trees are mentioned. It would be important to make these trees (or at least the alignments used to build them) publicly available. Also the authors mentioned the "domains of reverse transcriptase": they should specify if the complete RT was used or just a tract. In this latter case they should specify which one and, again, make these sequences publicly available.

    Page 7, lines 5-9. This comparison is not clear. In particular, how did the authors retrieve the data for the other species? Did they search other assemblies for these domains? Or did they retrieve them from supplemental Materials if available? Please specify and, if the latter case applies, cite the appropriate literature.

    Page 7 lines 12-13 "Gene tree": which gene? That's the Phylogenetic tree for Ty3-gypsy elements...

    Page 7 lines 12-59. I see some issues and lack of information here. In particular: -how did the authors define the clades they are referring to? I can see a clear tree topology, however I expect some sort of statistical evaluation in support of it. -Did authors perform a bootstrap analysis? -If this is the case, for how many replicates? -What is the bootstrap support for each of these clades? Also I would avoid to point to single clades as "left-most"

    Page 7 line 27. Typo: P. aibes should be P. abies

    Page 7 lines 35-37 from "...to maize was far more diverse..." on. I admit I have some difficulties in grasping the meaning of this sentence. I suggest to rephrase it.

    Page 7 lines 47-48. Regarding the higher conservation of Ty1-copia elements there are papers that can be cited in support of this evidence for plants, in general and for gymnosperm in particular. See for instance:

    Wicker, T., Keller, B., 2009. Genome wide comparative analysis of copia retrotransposons in Triticeae, rice and Arabidopsis reveals conserved ancient evolutionary lineages and distinct dynamics of individual copia families. Genome Res. 17, 1072-1081.

    Smykal, P., Kalendar, R., Ford, R., Macas, J., Griga, M., 2009. Evolutionary conserved lineage of Angela-family retrotransposons as a genome wide microsatellite repeat dispersal agent. Heredity 103, 157-167

    Moisy, C., Schulman, A.H., Kalendar, R., Buchmann, J.P., Pelsy, F., 2014. The Tvv1 retrotransposon family is conserved between plant genomes separated by over 100 million years. Theor. Appl. Genet. 127, 1223-1235.

    Zuccolo et al. 2015. The Ty1-copia LTR retroelement family PARTC is highly conserved in conifers over 200 MY of evolution Gene 568 89-99

    Page 7 line 45: why clade 1 should be "the most conserved"?

    Page 7 lines 50-51: Possibly I missed something here, however why clade 1 should be the most basal considering that this tree is not rooted?

    Page 7 lines 53-54: "...remarkably less expansion...". Note that the phylogenetic trees have been built using alignments from the extant population of LTR-RT. Because of this there is no evidence leading to "less expansion": it could well be the opposite i.e. a "small retention"...

    Page 7, line 57. Typo: P. aibes should be P. abies.

    Page 8, lines 3-8. This is an important piece of information and as such it deserves a better description of the strategy used. For instance: -how many complete elements LTR-RT elements were predicted? -the sequence of these elements should be made publicly available (or at least their coordinates in the repeats gff3 file should be pointed out). -which strategy was used to infer their insertion times? I guess it was that proposed by SanMiguel et al in 1998, so properly cite it. -most importantly: which mutation rate was used to translate the nucleotide distances into time? On a side note, I somehow disagree with the use of the term "burst" to describe an event spanning at least 8 my: you can simply say that most of the amplification occurred between 16-24 mya (please correct this all over the text).

    Page 8, lines 12-34: here (or in the discussion) please discuss also "Evolution of gene structure in the conifer Picea glauca: a comparative analysis of the impact of intron size" Stival Sena et al, BMC Plant Biology 2014.

    Page 8, lines 24-31 from "The intron regions..." to "percentage of repeats": please rephrase this sentence that, as it is, doesn't convey a clear description of the results.

    Page 8, lines 30-33. Is this sentence a general comment of the results, as the reference would suggest or the "preferential accumulation" was proved in this study for ginkgo genes?

    Page 8, line 45. Please add 4DTV to the list of abbreviations or briefly explain it.

    Page 8, line 56. Typo: Z. may should be Z.mays

    Page 9 line 42 "...in male is obviously..." I would change "obviously" with "clearly" here.

    Pages 9-11: please provide a more detailed description of the conditions in which RNA seq was extracted from different tissues. Also, specify if any replicate was carried out. As a general comment and recommendation regarding this section of the manuscript I would suggest to properly caution the reader regarding the fact that all these data and evidence are interesting but absolutely preliminary. Especially considering that the experimental design was intended to obtain a glimpse of the complexity of these data but definitely, as it is, it is far from being defined robust.

    Page 10, line 37: please add the appropriate unit to the transcriptome data figures. It's FPKM here.

    Page 12, lines 25-31: authors should add also a comparison with the TE content estimates available for other gymnosperm. Also, in the case of rice it would be better to provide and to cite the most recent estimate that can be found in "The map-based sequence of rice genome" Nature 436, 793-800 2005

    Page 12 lines 56-58: authors should also discuss, the evidence proposed in "Early genome duplications in conifers and other seed plants" Li et al., Science 2015

    Page 13 lines 6-9: Similarly to what I pointed out before, it's not clear how the data for Norway spruce were obtained. Without this information it is pointless to speculate about the differences that can be seen. Furthermore a comparison with data available for Loblolly pine would be interesting here. Note also that the figures provided in line 7 (1506 and 686) are different from those proposed at page 7 (2416 vs 1790). Please explain and clarify this discrepancy.

    Page 13, line 16: what do authors mean with "LTR/gene"?

    Page 13 line 16: "might occur in ~3 mya". Indeed it occurred during a even shorter time frame. See: Baucom RS, Estill JC, Chaparro C, Upshaw N, Jogi A, et al. (2009) Exceptional Diversity, Non-Random Distribution, and Rapid Evolution of Retroelements in the B73 Maize Genome. PLoS Genet 5(11): e1000732. doi:10.1371/journal.pgen.1000732

    Page 13, lines 18-22: From "The removal..." till "Norway spruce". Actually it is not the removal but the "lack of" efficient removal possibly leading to the huge genome of Norway spruce. Please reword the sentence accordingly.

    Page 14, line 34: explain the meaning of the acronym ROS (or add it to the list of abbreviations)

    Page 14, lines 42-44 Add the reference relative to FLS2 discovery and characterization.

    Page 14, line 48: state the number of these duplicated genes in A. thaliana and provide a reference for this information (or explain how it has been retrieved).

    Page 15, lines 33-41. It is not clear where the comparisons discussed were described in results. I am referring in particular to Carica papaya that is mentioned for the first time in discussion. As it is, this sentence is not supported by the data collected and presented.

    Page 15, lines 52-57 "For gymnosperm..." I suspect that few words or an entire sentence is missing in this paragraph because, as it is, its meaning is quite obscure.

    Page 17, line 22: "low quality bases": please use a Q-phred like value to define "low quality bases"

    Page 17, lines 47-52: please made the de novo repeat libraries publicly available. This would be an extremely useful information for the scientific community.

    Page 18, line 31: "de novo genes". These are not necessarily new genes, they could be just highly diverged or simply species specific ones.

    Page 19, lines 28-32 from "We filtered..." on. The sentence is quite confuse, please rephrase it.

    Figure 1 d: please briefly comment in text the strange placement of OSAT and ATHA mixed with Gb genes in cluster 1804

    Figure 3, explain in the figure legend the meaning of the colored stars

    Finally the Abstract should be rewritten according to the changes made in the results, methods and discussion sections. Are the methods appropriate to the aims of the study, are they well described, and are necessary controls included? If not, please specify what is required in your comments to the authors. No.

    Are the conclusions adequately supported by the data shown? If not, please explain in your comments to the authors. No.

    Does the manuscript adhere to the journal’s guidelines on minimum standards of reporting? If not, please specify what is required in your comments to the authors. Yes.

    Are you able to assess all statistics in the manuscript, including the appropriateness of statistical tests used? There are no statistics in the manuscript.

    Quality of written English Please indicate the quality of language in the manuscript: Needs some language corrections before being published. Declaration of competing interests Please complete a declaration of competing interests, consider the following questions: Have you in the past five years received reimbursements, fees, funding, or salary from an organization that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold any stocks or shares in an organization that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold or are you currently applying for any patents relating to the content of the manuscript? Have you received reimbursements, fees, funding, or salary from an organization that holds or has applied for patents relating to the content of the manuscript? Do you have any other financial competing interests? Do you have any non-financial competing interests in relation to this manuscript? If you can answer no to all of the above, write ‘I declare that I have no competing interests’ below. If your reply is yes to any, please give details below. I declare that I have no competing interests.

    I agree to the open peer review policy of the journal. I understand that my name will be included on my report to the authors and, if the manuscript is accepted for publication, my named report including any attachments I upload will be posted on the website along with the authors' responses. I agree for my report to be made available under an Open Access Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments which I do not wish to be included in my named report can be included as confidential comments to the editors, which will not be published. I agree to the open peer review policy of the journal.

    REVIEW 1, REVIEWED ON AUGUST 30, 2016

    In the present version of the manuscript, the authors positively addressed most of the criticisms, suggestions and comments I provided in the previous review.

    The manuscript quality has improved and although some sections have been removed it still describes enough data analysis to be considered as a full research article. There are some issues to address, though. I list them here:

    a) as indicated in the first round of review, the quality of written English still needs some edits. I suggest to contact a professional editing service in order to address this point.

    b) Page 5, paragraph "Genome annotation". Please better describe the "tandem repeats" identified; do they include SSRs?

    c) Page 6: Supplementary figure 4 doesn't display in supplementary figures file

    d) Page 7, paragraph "Evolution of LTR-RTs". "Phylogenetic trees suggested that..." Actually the trees didn't suggest this. Instead the similarity searches carried out by authors retrieved this amount of data.

    e) Page 7, the amount of LTR-RT related sequence data stated for different species still remains a major issue for me. In particular authors claimed that 2416 and 1790 elements for Ty1- copia and Ty3-gypsy elements respectively were retrieved in P. abies. If I got the meaning of authors reply to my previous comments correctly these data for P. abies came from the search of the assembly version 1. However a quick tblastn search carried out using as a query the Ty1 copia RT sequence indicated by authors on the v1 P. abies assembly gave about 11,900 positive hits when an evalue of 1e-5 was set as significance threshold. Of these hits more than 9,000 are longer than 95 AA residues. I suspect the very same is true for the Ty3-gypsy. How can this inconsistency be explained?

    f) page 8: "...to the most basal clade". Again I reiterate what I said in the first round of review: it doesn't make sense to talk of a basal clade in an unrooted phylogenetic tree. Authors replied that they used "most basal" as having the same meaning of "most conserved". This is not the case. So please correct the text accordingly i.e. change the "basal-most clade 1" with "the most conserved clade 1".

    g) page 8, paragraph "TE insertions in introns". Authors provided a p-value (< 2.0 e-6) but omitted to state which statistical test was used. Please state it.

    h) page 9, paragraph "gene duplications". To figure out the timing of WGD events, the authors used a mutation rate of 2.2 e-9. This has been calculated for LTR-RTs (Nysted,2013). Authors should use instead the ratio of 0.68 e-9 calculated for gymnosperm genes as described in Buscchiazzo et al, 2012 ("Slow but not low: genomic comparisons reveal slower evolutionary rate and higher dN/dS in conifers compared to angiosperms". BMC evolutionary biology.)

    i) page 12 from "With comparison to Norway spruce..." on. Once more, in order to gauge properly the meaning of these figures it is important to understand how these data were obtained and why they differ so strikingly from the evidence gathered carrying out a simple tblastn search (see comment "e"). Furthermore, as a general comment regarding these comparisons involving different species, it is important to note that differences seen likely are also due to the different metrics characterizing the various genome assemblies.

    l) Page 13, lines 7-10. The sentence as it is is not clear. Please rewrite.

    m) Page 17, lines 5-6. These data are valuable, however authors should caution the reader about the inherent limitations of a search for LTR-RTs based only on LTR_STRUCT run under the (loose) default settings. In particular a significant amount of false positives is expected from such a search. Indeed, if authors carry out a simple dot plot analysis of the longest and shortest putative LTR-RT they identified, they'll see in the first case a significant amount of nested insertions and, in the latter, several tandem arranged repeats misidentified by the program as LTRs. All of this is expected but, again, it should be pointed out

    n) Page 17, line 9: "distance" should be "nucleotide distance".

    o) Supplementary figure 6: add a legend.

    Authors' response to reviews: https://static-content.springer.com/openpeerreview/art%3A10.1186%2Fs13742-016-0154-1/13742_2016_154_AuthorComment_V1.pdf


    Published in
    Reviewed by
    Ongoing discussion
  • Level of interest
    Please indicate how interesting you found the manuscript:

    An article of importance in its field .

    Quality of written English
    Please indicate the quality of language in the manuscript:

    Acceptable .

    Declaration of competing interests
    Please complete a declaration of competing interests, considering the following questions:
    1. Have you in the past five years received reimbursements, fees, funding, or salary from an
    organisation that may in any way gain or lose financially from the publication of this
    manuscript, either now or in the future?
    2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
    financially from the publication of this manuscript, either now or in the future?
    3. Do you hold or are you currently applying for any patents relating to the content of the
    manuscript?
    4. Have you received reimbursements, fees, funding, or salary from an organization that
    holds or has applied for patents relating to the content of the manuscript?
    5. Do you have any other financial competing interests?
    6. Do you have any non-financial competing interests in relation to this paper?

    If you can answer no to all of the above, write 'I declare that I have no competing interests'
    below. If your reply is yes to any, please give details below.

    I declare that I have no competing interests.

    I agree to the open peer review policy of the journal. I understand that my name will be included
    on my report to the authors and, if the manuscript is accepted for publication, my named report
    including any attachments I upload will be posted on the website along with the authors'
    responses. I agree for my report to be made available under an Open Access Creative Commons
    CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
    which I do not wish to be included in my named report can be included as confidential comments
    to the editors, which will not be published.

    I agree to the open peer review policy of the journal.

    Authors' response to reviews: (https://static-content.springer.com/openpeerreview/art%3A10.1186%2Fs13742-016-0148-z/13742_2016_148_AuthorComment_V2.pdf)


    Published in
    Reviewed by
    Ongoing discussion