Review badges
4 pre-pub reviews
0 post-pub reviews

Background The production of biogas takes place under anaerobic conditions and involves microbial decomposition of organic matter. Most of the participating microbes are still unknown and non-cultivable. Accordingly, shotgun metagenome sequencing currently is the method of choice to obtain insights into community composition and the genetic repertoire.

Findings Here, we report on the deeply sequenced metagenome and metatranscriptome of a complex biogas-producing microbial community from an agricultural production-scale biogas plant. We assembled the metagenome and, as an example application, show that we reconstructed most genes involved in the methane metabolism, a key pathway involving methanogenesis performed by methanogenic Archaea. This result indicates that there is sufficient sequencing coverage for most downstream analyses.

Conclusions Sequenced at least one order of magnitude deeper than previous studies, our metagenome data will enable new insights into community composition and the genetic potential of important community members. Moreover, mapping of transcripts to reconstructed genome sequences will enable the identification of active metabolic pathways in target organisms.

  • The authors report on a metagenome and metatranscriptome for a biogas-producing microbial community. The authors provided a Docker container as an aid for reproducing the metagenome assembly which was tested in my review.

    Following the instructions provided by the authors [1], I was able to use the Docker container to execute the metagenome assembly workflow which the authors have implemented as a Makefile. The Docker container was tested using an AWS r3.8xlarge instance with 32 cores, 244 GB RAM and 100 GB storage.

    The contigs produced by the Docker container on my AWS instance showed that they were similar but not identical to those obtained by the authors. This was confirmed with a blast all vs all comparison of the contig sequences. These small differences in my contig sequences affected the gene prediction analyses further along the metagenome workflow.

    After discussing the differences between contig files with the authors, we believe that the small differences in contig sequences are caused by the non-deterministic nature of the Ray metagenome assembler. A multi-threaded run of Ray can cause different seed reads to be selected each time for building contigs [2]. This appears to explain why the gene prediction results I obtained are different to those of the authors. I believe this is a plausible reason for why my results differed from those of the authors and, as such, their Docker container reproduces their metagenome assembly process as reported in the manuscript.

    Minor Essential Revisions

    There are 2 typos in the file [3] in the Docker container section; "wokspace" should be "workspace".

    It would be useful to let users know how long it takes for the Makefile to complete its execution on a r3.8xlarge AWS instance.

    3. Level of interest An article of importance in its field Quality of written English Acceptable Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests I work for GigaScience as a bioinformatician.

    Authors' response to reviewer: (

    Published in
    Reviewed by
    Ongoing discussion
  • This data note contains results from metagenomics and metatranscriptomics experiments from one sample from one biogas reactor in Germany.

    The data is interesting, but would be even more appealing if it were a time series. Otherwise, it is rather difficult to assess trends in the bioreactor.

    Major Compulsory Revisions


    Minor Essential Revisions

    For the 2 figures, the authors have to add a legend with the colors directly in the figure for better readability.

    The discussion needs to be merged with the Data description.

    Abstract, background:

    -metagenome sequencing +shotgun metagenome sequencing


    It should be mentionned in the background whether that are similar datasets for metatranscriptome. It should be said that the Martha Zakrzewski et al. paper used 454 amplicon sequencing.

    The text surrounding reference 3 and 4 (previous work) should be expanded.

    Digester management

    Is there a publication describing the reactor ? If not, a general citation on biogas reactor should be provided to the reader because GigaScience publishes papers not always on next-generation sequencing and/or bioreactors.

    Library construction

    A reference should be provided for the Gubler-Hoffman protocol.

    Metagenome assembly

    The dataset should also be assembled with MEGAHIT (appeared in 2015 in Bioinformatics) for comparison. Ray Meta is 3 years old.

    The copyright on Figure 1 is not compatible with the Creative Common license of GigaScience.

    The caption for figure 1 should tell which are the starting compound used to produce the methane.

    Discretionary Revisions

    The title should be shortened. For example:

    Metagenome and metatranscriptome of a biogas-producing community in a biogas plant

    The difference between amplicon sequencing and shotgun sequencing should be used a a main selling point. The paper from 2012 was an amplicon sequencing paper. This paper is about shotgun sequencing.

    Level of interest An article of importance in its field Quality of written English Acceptable Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests I declare that I have no competing interests.

    Authors' response to reviewers: (

    Published in
    Ongoing discussion
  • This is a major achievement in the field of biogas research, therefore it certainly deserves publication. The first deeply sequenced metagenome and metatranscriptome databases were generated, which pave the path to a more thorough understanding of these complex microbial communities. The paper, submitted as a “data note” for GigaScience, only scratches the surface of the vast amount of information gathered in this work. The title is therefore a bit misleading as only data sets corresponding to the methanogenesis pathways are presented here, which comprise probably about 5-8% of the total sequences to be analyzed presumably in subsequent publications. Major compulsory revisions: none Minor essential revisions: change the title to inform the reader that this report is limited to the methanogenesis pathways. Discretionary revisions: none

    Level of interest An article of outstanding merit and interest in its field Quality of written English Acceptable Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests 'I declare that I have no competing interests' .

    Authors' response to reviewers: (

    Published in
    Ongoing discussion
  • In this nice data paper, the authors provide a deep Illumina metagenomic and metatranscriptomic data set, assembly, and high-level analysis of a biogas reactor microbial community.

    The paper is well written and the data seems to be of good quality, based on their reporting. The paper is also highly reproducible, coming with a version-controlled workflow (in a Makefile), on github, with an associated Dockerfile; using Docker is a great idea but is imperfectly executed still (see below).

    All of the raw data is deposited publicly and was available to me.

    Major comments:

    The authors should include the number of reads that mapped back to the assembly at their 1 kb cutoff, as this would help us gauge the inclusivity of the assembly for both the DNA and RNA reads.

    Minor quibbles --

    I cannot evaluate the claims of priority. Isn't it sufficient to say deep Illumina metagenomes are rare and leave it at that?

    For the assembly, how were these parameters picked, and is there any evaluation of sensitivity or specificity?

    The GitHub and Docker URLs in the PDF have an ] at the end that blocks just clicking on them.

    I would suggest deprecating the Docker discussion a bit; it didn't work for me. I also have other suggestions for modification. Details below. I might suggest putting a tag on the repo so that you can link to the last time you actually ran the Docker container.


    """ docker run -v /path/to/output/directory:/home/biogas/output 2015-biogas-cebitec """

    didn't work, presumably due to docker version upgrades; I needed to put metagenomics/2015-biogas-cebitec before the last bit.


    The raw data is downloaded into the docker container, which can be a bit of a problem, because on AWS (where I tried to run this docker container) the containers were stored on on the root disk. There are two possible solutions that I can see --

    • do as I did in this blog post, and put the data on the host disk and then mirror it into the container:

    • use a data volume:

    I think the first solution works better, but in any case, something needs to be done about putting large amounts of data in an opaque and potentially trashable container that consumes all the available disk space as part of the make file :).

    Third, and most troublingly, my attempt to run the docker container failed with a missing 'unzip' command. This is probably easily fixable but does indicate a mismatch between the workflow and the container.

    In sum, apart from minorly revising the docker container and/or discussion, and providing mapping rates, this looks great!

    Titus Brown UC Davis Quality of written English Acceptable Declaration of competing interests I declare that I have no competing interests.

    Authors' response to reviewers: (

    Published in
    Reviewed by
    Ongoing discussion