Content of review 1, reviewed on April 11, 2017

In this manuscript, the authors conducted the analysis to the meta-barcoding and shotgun metagenomic data of spontaneous wine fermentation, showing high correlations in abundance measurement. Furthermore, the comparison between the meta-barcoding and shotgun metagenomic data showed that there is strong bias in the meta-barcoding data for the genus Metschnikowia. In general, the manuscript was well written, with appropriate structure and comprehensive description of the methods and results.

Major Comments: 1. About Figure 1: 1)The description about Figure 1 needs improvement. For example, Line 537, "both plots", what are the two plots specifically? (should be A and B, but it should be more clear) Also it seems that plot (B) plot(C) were mislabeled so now the description does not match the labels in the figure. Also the last sentence "Abundance values are presented as in Fig 1A" seems not accurate here, The legend of the values is located at the bottom part of the figure, not in subplot A. 2) In subplot C, how were the nodes about the top 30 genera positioned in the plot? How was the distance between nodes calculated? 2. line 215, the reference genomes used for alignment were "assembled from existing genomic resources for fungal and bacterial genera that were known, or suspected of being wine-associated". What exactly are the genomic resources? On line 408-413 in Methods section, "whole genome sequences were collected, when possible", here is "collected" the same as "assembled"? If it is assembled, what is the specific assembler used here? This is not very clear and may need further clarification. 3. About Figure 2A: "only the abundance measures for species within the Hanseniaspora genus" are depicted. Why this genus is picked? On line 258-260, it is mentioned, the identity values are significantly lower for "Mucor circinelloides, Pseudomonas syringae and Hanseniaspora valbyensis", why not pick these genera to be presented in 2A?
 4. Table 2: 1) line 276, "two cases", what are the two cases? 2) In Table2, What does the gray color represent? Also, the boxes for "Total OTUs" column are grey and not grey respectively. Should this be more consistent? 3) Should the "Control mix 1" for "AWRI1498" be "1x10**6" as "AWRI796"?

Minor Comments: 1. line 49, "To the map" -> "To map" 2. line 281, "D1 T1 and T2" , a comma is missing between "D1" and "T1". 3. line 293, "were not within a five-fold range", but on line 290, it is "two-fold". Why "five-fold"? 4. line 560, "to the total the abundance of", is there an extra "the" in the sentence? 5. line 565, "an abundance of S. cerevisiae of 1 million reads per million", what does "per million" mean hear?

Are the methods appropriate to the aims of the study, are they well described, and are necessary controls included? If not, please specify what is required in your comments to the authors.

Yes

Are the conclusions adequately supported by the data shown? If not, please explain in your comments to the authors.

Yes

Does the manuscript adhere to the journal’s guidelines on minimum standards of reporting? If not, please specify what is required in your comments to the author

Yes

Are you able to assess all statistics in the manuscript, including the appropriateness of statistical tests used?

Yes, and I have assessed the statistics in my report.

Quality of written English Please indicate the quality of language in the manuscript:

Needs some language corrections before being published.

Declaration of competing interests Please complete a declaration of competing interests, consider the following questions: Have you in the past five years received reimbursements, fees, funding, or salary from an organization that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold any stocks or shares in an organization that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold or are you currently applying for any patents relating to the content of the manuscript? Have you received reimbursements, fees, funding, or salary from an organization that holds or has applied for patents relating to the content of the manuscript? Do you have any other financial competing interests? Do you have any non-financial competing interests in relation to this manuscript? If you can answer no to all of the above, write ‘I declare that I have no competing interests’ below. If your reply is yes to any, please give details below.

I declare that I have no competing interests.

I agree to the open peer review policy of the journal. I understand that my name will be included on my report to the authors and, if the manuscript is accepted for publication, my named report including any attachments I upload will be posted on the website along with the authors' responses. I agree for my report to be made available under an Open Access Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments which I do not wish to be included in my named report can be included as confidential comments to the editors, which will not be published.

I agree to the open peer review policy of the journal.

Authors' Response to reviewers

Reviewer 1 I would have expected to see an extended discussion/conclusion about how this methodology was able to reach to a higher taxonomic depth for certain taxa, even able to detect some strains, and some bacteria too. As this finding is becoming an accepted reality in the literature with respect to comparing shotgun metagenomics and phylotyping approaches, in the interests of succinctness we have not discussed this aspect at length. We have sought to concentrate on the novel findings stemming from this particular study, with respect to the detailed comparison of abundance levels and biases etc.

The fact that around 85% of the reads mapped to a wine related microorganism reference genome is something to be definitely highlighted as a conclusion. We have no real basis for regarding an 85% alignment rate as significant finding or not. What should be noted however, was that extra-effort was made to ensure that the reference panel was as comprehensive as possible for these datasets, hence the use of the de novo assembly strategy to specifically capture the highly-abundant species in our data for which reference genomes were not currently available. Thus, the 15% of unaligned reads may represent tens or hundreds of low-abundance species for which we cannot account.

No deep discussion is made about the community dissimilarities found among samples with the two approaches ( e.g comparing the samples clustering obtained in figure 1c and figure 2c). With the small number of samples available to this study, we feel that we are not in a position to generalize with regards to the differences observed in the samples, beyond what was observed.

Another drawback is that the work does not include any metadata at all regarding the must/vineyard characteristics, and thus it lacks a discussion about why first stages of must and ferments coming from different wineries are that different
All ferments were Chardonnay, all were grown in similar geographic locations and to similar ripeness levels. Even if more complex metadata is included, the sample size is simply too small to attempt to correlate the observed microbiomes with any other parameters. We feel that this is a shortcoming that is all too common in the field of metagenomics and would seek to avoid over-analysis of this dataset.

I am a bit confused about certain sections of the paper that I think should be further explained. And also certain sections from the discussion should go into material section. For example, I think that authors should explain how the "Control populations" were maid and this should be included in the methods section and well indicated with a header ( e.g in line 323) A section on the creation of the control populations has been added to both the results and methods sections of the manuscript.

The analysis and statistical approaches are in overall sound. However, I found some figures conclusions to be difficult to follow (In particular the conclusions get in figure 2c are misleading for me, line 278-283)
This has been addressed below as there was an erroneous substitution between the Y2 and Y3 samples in the text.

Comments/Corrections -Line 30 and line 49-"To the map" change it by " To map" Amended

  • line 91: what is "with several amplicon based methods" referring to? That they use different primers, different sequencers…? I would say " several amplicon based studies have being conducted".
    These studies do often use different primers, sequencing technologies etc. The change has been included however to aid in clarity.

  • Line 118 : "D0 ( at inoculation)"?? But this is spontaneous fermentation! This was a carry-over mistake from the control populations and has now been amended in the text.

  • Line 119- "As seen with the selective plating…" here it is my understanding that you are talking about previous culture dependent works? This sentence needs a reference. -line 119-122- which data are you referring to for this statement?is it figure 1??? You need to add it The section regarding the plating experiments has been removed for clarity

-line 129-132 I would suggest this to be part of methods, not in this section This section is listed in the methods, but, given that it was requested that more data be added by other reviewers regarding the samples here, this text has also been included in the results. -line 139 "all the of otus" change for "all the otus" Amended -Line 139 : why 78 samples ? but there are 66 78 was a previous number, this has now been amended -line 171 "figure (1B)" should be "figure (1C)" Figure 1 has now been corrected as panel B and C were mis-labelled.

-Line 194- there is not vintage information in table 1 Amended - line 211- I am curious to know how many of the reads aligned to vitis and were discarded? you could include the values in table 1 In all cases, alignment to the Pinot Noir reference genome was very low (<1%). This has been included as a generic statement in the text.

-line 216: having only 15% of reads unable to align is really good result, it seems like a "too" good results. Does it mean that the wine environment microbes are very well characterized/(we have the genome available for most of the wine environment microbes…?) or is it that when you mapped your reads with the genome references you were not very restrictive on the aligment? (only q10?) I would say this is something that might be interesting to discuss in the paper We cannot comment on what is “good” or “too good” with respect to the number of reads that would map to a reference panel. Alignment parameters were strict however, and this is readily observable through the identity scoring. Distant alignments produce low-identity matches, with most windows displaying >96% average DNA identity, which would only be observed between genomes of very closely related sister-species at best.

What should be noted however, was that every effort was made to ensure that the reference panel was as comprehensive as possible for these datasets, hence the use of the de novo assembly strategy to specifically capture the highly-abundant species in our data for which reference genomes were not currently available. Thus, the 15% may represent many tens or hundreds of low-abundance species.

line 236- authors have identified several Prokaryotes. I suggest you compare those taxa identified by shotgun with what is already known in other studies that have used 16s RNA phylotyping. does it fit with the most abundant bacteria community found in other ferment samples?

A section addressing this has been added, although it should be noted that the previous phylotyping exercises have not generally classified bacteria beyond the level of class, which makes direct comparison difficult.

-Line 248: why not taking this reads and for example blast them to see whether they are from mitochondria? The MetaPhlAn approach is limited in its resolution, but is suggestive of mitochondria. Due to the sheer number of reads, exhaustive blast analysis against the mitochondrion is excessive. Blast analysis of a small subset of reads suggests mitochondrial origin but not all reads were examined.

-line 253-260; this paragraph seem to be referring to Fig 2A, but in line 260 authors talk about other microorganisms rather than the ones in Fig 2a. An the results from the microbes specified in fig2a are not discussed References to Fig. S3, which contains the full dataset, rather then the example data presented in Fig. 2A have now been included for clarity.

line 278-280: Figure 2C. "D1 samples (T1,t2 and Y3) largely differentiated …" I can see T1 and T2 as a differentiated group, but Y3 does not seem to me that different from Y1d1? I guess I am not interpreting the ordination plot correctly…. Many thanks to the reviewer for finding this error. The correct sample should have been Y2 rather than Y3. This has now been amended in the text.

Any way, it would have been nice to do a PCOA with ItS data and another PCOA with shotgun data with just the samples that are in common, in order to see if the clustering of the groupings is similar with both methods ( as it seems to be), maybe as supplementary table? Unfortunately, given the different data types, we have not found an efficient or accurate means to directly compare these samples.

Line 287: 23 taxonomic identifiers. Are authors comparing the shotgun data that were mapped with the reference genomes?( or also the ones with metaphlan?) are there only 23 taxa in common in both method? Only the shotgun method has been used to compare the datatypes as metaphlan was not run on the entire dataset, and lacks markers for many genera and species for which the reference dataset can detect. We have also not fully explored the quantitative limits of metaphlan which only works on marker genes rather than whole genomes.

-line 323- I understand this refers to the control samples I am confused about this one, as this is included under the header "ferment samples"? The section regarding the plating has been removed for clarity

-line 393 -how much of the total abundane do this 30 otus make? Why just doing it with this 30? The top 30 OTUs were chosen for clarity when presenting the multidimensional analysis and this is a common practice. The 30 OTUs that were used comprise 99% of the data and would be responsible for nearly all the PCoA loadings. As the top two loadings are presented, data below the top 30 OTUs have no effect on the data as presented or the conclusions drawn.

-Line 396 - change "and two winery samples " with "from two winery samples" Amended

  • line 437: in methods nothing is said about how did you compare both ITS and shotgun sequencing results This methodology was summarized in the legend to Fig.3, but we have now also added text to the methods section explaining this process.

Figures and tables FIG1c: despite the colors, I would suggest authors to write the whole name ( e.g T1D0,T1D1…) to refer to the samples rather than just referring to them by the color and fermentation stage. ( same for figure 2c) We had initially tried to label the samples like this, however it detracted significantly from the legibility of the figure. As the other reviewers were comfortable with this representation, we suggest that this not be changed.

Fig 1b:. include species name in the figure Tentative species assignments have been added to the figure.

-Table s3 some of the species do not have the accession numbers, they have a blank cell These are unpublished data but the sequences are present in the .fasta file associated with the publication. This table has been updated to reflect this.

-table 3: In the legend authors need to explain that " alignment" means aligning to a reference genome A footnote has been added to explain “alignment”

-table 4: I am confused about the spacer, what is the dot referring to?( to any nucleotide ( in that case better specify it as "n" or the lack of a nucleotide? This table has been edited and additional footnotes have been to aid in clarity

Reviewer 2 Major Comments: 1. About Figure 1: 1)The description about Figure 1 needs improvement. For example, Line 537, "both plots", what are the two plots specifically? (should be A and B, but it should be more clear). Also it seems that plot (B) plot(C) were mislabeled so now the description does not match the labels in the figure. Also the last sentence "Abundance values are presented as in Fig 1A" seems not accurate here, The legend of the values is located at the bottom part of the figure, not in subplot A.

We thank the reviewer for finding this error. We have amended this figure legend, and corrected the labelling mistake.

2) In subplot C, how were the nodes about the top 30 genera positioned in the plot? How was the distance between nodes calculated?

Nodes were positioned according to their PCoA loadings. The text in the legend has been altered to reflect this.

  1. line 215, the reference genomes used for alignment were "assembled from existing genomic resources for fungal and bacterial genera that were known, or suspected of being wine-associated". What exactly are the genomic resources? On line 408-413 in Methods section, "whole genome sequences were collected, when possible", here is "collected" the same as "assembled"? If it is assembled, what is the specific assembler used here? This is not very clear and may need further clarification.

We have removed the term “assembled” and replaced it with “obtained” to address this ambiguity

  1. About Figure 2A: "only the abundance measures for species within the Hanseniaspora genus" are depicted. Why this genus is picked? On line 258-260, it is mentioned, the identity values are significantly lower for "Mucor circinelloides, Pseudomonas syringae and Hanseniaspora valbyensis", why not pick these genera to be presented in 2A?

Hanseniaspora spp. were chosen for Figure 2A as they: - represented the largest species complex in the dataset - contained species with abundances that ranged over 5 orders of magnitude - contained species that displayed varying levels of identity to the reference genomes used (Hanseniaspora valbyensis is represented due to this)

The entire dataset is included in Fig. S3 such that the reader can see the results from the entire dataset. An additional reference to this figure has now been included in the legend to Fig 2.

  1. Table 2: 1) line 276, "two cases", what are the two cases? Table 2 displays all of the data and indicates the two cases where this occurred.

2) In Table2, What does the gray color represent? Also, the boxes for "Total OTUs" column are grey and not grey respectively. Should this be more consistent? The grey shading represents the fact that the Saccharomyces cerevisiae values have been used for normalization purposes and so are set to a ratio of “1”. The Table footnote has been altered to attempt to clarify this for the reader. 3) Should the "Control mix 1" for "AWRI1498" be "1x10**6" as "AWRI796"? No, 1x104 is correct

Minor Comments: 1. line 49, "To the map" -> "To map" Amended 2. line 281, "D1 T1 and T2" , a comma is missing between "D1" and "T1". Amended 3. line 293, "were not within a five-fold range", but on line 290, it is "two-fold". Why "five-fold"? As most results were well within a two-fold range, the five-fold range was subsequently used to focus on those species for which a major bias was observed between the shotgun and ITS approaches.

  1. line 560, "to the total the abundance of", is there an extra "the" in the sentence? Amended
  2. line 565, "an abundance of S. cerevisiae of 1 million reads per million", what does "per million" mean hear? This simply reflects the data transformation that was performed to enable the comparison of the shotgun and ITS data. The most abundant species, Saccharomyces cerevisiae was therefore set at 1 x 106 or 100% of theoretical, and other values were then proportionally scales relative to this level. The legend has been altered to attempt to clarify this.

Reviewer 3: General comments:

One further general suggestion: As this is a research article, I would like to see a few sentences added to the Discussion (or the "Potential Implications" section since there is no separate Discussion…) indicating whether there are broadly similar or dissimilar findings, fungal and possibly bacterial, in the current study when compared to the few previous studies of ITS-based sequencing performed on uninoculated wine fermentations; I realize that it may be difficult to ascribe causal significance to differences or similarities since the vineyard locations/wineries/varietals/methods are all very different, but it would be nice to see what has been previously observed summed up and compared with what has been found in this study, especially with regard to whether the two "types" of starting/early fermentation populations reported here (Metschnikowia and Hanseniaspora dominated, vs more diverse Aureobasidium and Rhodotorula (and others)- dominated) might also have been seen as "types" before. Unfortunately, detailed fungal genus and/or species level data for comparison is only available from one publication (Pinto et al 2015), as data has either not been taken from ferment samples (fruit only) or the actual abundance data is not presented or available from the publication (e.g. Bokulich et al., 2016). As such, we are cautious in trying to over analyze the results from such a small number of studies. We have however added text to discuss the fact that similar species have been found in these studies.

Line 32 - change to "phylotyping" Amended

Line 43 - remove comma Amended

Lines 59-60 - may want to change first sentence just a bit, since it is essentially identical to the first sentence of the Abstract We have decided to keep this first line.

*Line 104-106 - Specify that a number of control samples were also included (maybe elaborate a bit on the number and replicates of the control samples, vs the number and replicates of experimental samples), and also make clear that the 20 samples that were shotgun-sequenced also contain control samples and are a subset of the entire set of 66 that were ITS-sequenced. This sampling information and naming system should definitely be shown somehow in a main-text Table - I think it could be efficiently included in Table 1 - to help people understand which samples were used for which assays, hopefully without having to go to a supplemental table. The text has been modified to clarify this section

Line 107 - remove comma Amended

Line 113 - maybe say "…and applicability of performing metagenomics analyses on laboratory-scale uninoculated ferments," Amended

*Line 118 - Why does it say "inoculation"? I don't think anything was inoculated…? Amended

Also - I don't understand the progression of Baume drops: why does it go -1, -6 and -3 ? Should the last number be -13? I think it would also be good to parenthetically include, for each time point, what the equivalent percentage of progression through fermentation is (e.g., % of starting sugar fermented) for these timepoints too, since not everyone uses Baume units, and/or show this in a table (maybe show other equivalent units as well; see below comments for Lines 122-123 where I say a table of fermentation characteristics/kinetics is needed).
The sampling points and sugar concentrations have been amended to make this section clearer to the reader.

*Lines 119-121 - It seems that the results of the selective plating (which are described in Methods) are not shown anywhere? The results should be shown as a table or figure (main or supplemental). It might also be nice to compare plating results (just the numbers of Sacch spp. vs non-Sacch spp.) to both the ITS and shotgun-metagenomics results to see how correlated. This small section of the methods has now been removed

*Lines 122-123 - Need to include the data for all the actual fermentation dynamics that were collected for these ferments (e.g., starting sugar for each fermentation, residual sugar, ethanol, etc. determined at each time point, days of fermentation for each time point and for dryness) as a main or supp table. Brief data are provided in the text. All ferments went to dryness (<5g/L of sugar) between 12 and 27 days. Final alcohol concentrations etc were not determined for these ferments.

Line 136 - describe what "OTU" stands for (Operational Taxonomic Unit, I assume) and what exactly you mean by the term when using it in this paper (this is the first time this abbreviation is used, and it is not explained anywhere in the paper as far as I can see). The abbreviation of OTU has been included in the text

Lines 152-153 - From Table 2, it appears the numbers are incorrect in this sentence; it should be seven common wine-associated yeasts, representing six different species and five different genera. Amended

Line 171 - I think you mean Fig 1C instead of 1B ? And actually, it would probably be best to refer to both Fig 1A and Fig 1C here, and also at the end of the following sentence (line 174). Line 179 - this should definitely be Fig 1B, not 1C. There is confusion between the text and the figure and the actual legend in referring to Fig 1B and 1C; make sure to make everything align correctly. Figure 1 has now been corrected as panel B and C were mis-labelled.

*Lines 220-232 - I find all of this a bit confusing with regard to merging species and genera together, etc.; needs further, and more clear, explanations. The text has been amended to add clarity to this section

Line 278 - describe what you mean specifically by "ordinate analysis"
“ordinate” has been replaced by “multidimensional analysis (Bray-Curtis)”

Line 301 - change "providing" to "provides" Amended

Lines 553-554 - change to "For clarity and space considerations, we depict here only the abundance measures for species within the Hanseniaspora genus for the two T2D1 and Y1D1 replicates; results for all samples are presented in …" Amended

Line 555 - where it says "…Supp. Fig. R2)" - do you mean Fig S2 or perhaps Table S2? We thank the reviewer for finding this error. Supp. Fig. R2 has now been changed to to the correct reference of Fig. S3

Figure 1. In the legend, I believe the (B) and (C) sections need to be swapped (in the actual figure as shown, B is the Hanseniaspora break-down, and C is the dissimilarity figure). There are some mix-ups with these in the text as well (I note most or all in the specific comments, but make sure all instances are corrected). We thank the reviewer for finding this error, this has now been corrected.

Table 1 - As I say above in Lines 104-106 comments, I think you can efficiently include the sampling information in Table 1, or perhaps in a new main Table, showing which samples were used for which assays. Table 1 has been extensively edited to add the ferment sampling stages and the samples that were processed for ITS and/or shotgun metagenomics

Table 2 - Rename Table 2 (perhaps "Composition of Control Populations and Comparison of ITS vs Shotgun Metagenomics Abundance Results" ?), as this table shows a lot more than just the control populations info. Also please give a legend to the table (and/or describe more fully in the text) being sure to explain what "OTU's" means, and also briefly how and why the different control mixes were assembled and how the "ratio" numbers in parens were derived (and/or put this information in the Methods). Also italicize the bottom-most line in the species column (uvarum). Table 2 has been edited for clarity and a more extensive description of the control populations has been included in the text.

Table 3 - Again, a proper legend would be nice; also explain what the control populations were. Sample names have been expanded in Table 2. This along with changes to Table1 should make it clearer to the reader as to the source of each sample

Talbe 4 - I believe that the dots shown in the spacer section of the primer sequences mean that there is actually no nucleotide there (i.e., that the spacer can have 0, 1, 2 or 3 nucleotides; helps to avoid sequencing artefact/need for spike-in), but you should say that in a brief legend (which is needed). This table has been edited and additional footnotes have been to aid in clarity

Source

    © 2017 the Reviewer (CC BY 4.0).