Abstract

Metabarcoding has become a common approach to the rapid identification of the species composition in a mixed sample. The majority of studies use established short-read high-throughput sequencing platforms. The Oxford Nanopore MinION(TM), a portable sequencing platform, represents a low-cost alternative allowing researchers to generate sequence data in the field. However, a major drawback is the high raw read error rate that can range from 10% to 22%.To test whether the MinION(TM) represents a viable alternative to other sequencing platforms, we used rolling circle amplification (RCA) to generate full-length consensus DNA barcodes for a bulk mock sample of 50 aquatic invertebrate species with at least 15% genetic distance to each other. By applying two different laboratory protocols, we generated two MinION(TM) runs that were used to build error-corrected consensus sequences. A newly developed Python pipeline, ASHURE, was used for data processing, consensus building, clustering and taxonomic assignment of the resulting reads.Our pipeline achieved median accuracies of up to 99.3% for long concatemeric reads (>45 barcodes) and successfully identified all 50 species in the mock community. The use of RCA was integral for increasing consensus accuracy but was also the most time-consuming step of the laboratory workflow. Most concatemeric reads were skewed towards a shorter read length range with a median read length of up to 1,262 bp.Our study demonstrates that Nanopore sequencing can be used for metabarcoding, but exploration of other isothermal amplification procedures to improve consensus accuracy is recommended.


Authors

Baloglu, Bilgenur;  Chen, Zhewei;  Elbrecht, Vasco;  Braukmann, Thomas;  MacDonald, Shanna;  Steinke, Dirk

Publons users who've claimed - I am an author

No Publons users have claimed this paper.

Contributors on Publons
  • 2 reviewers
  • pre-publication peer review (FINAL ROUND)
    Decision Letter
    2021/01/07

    07-Jan-2021

    MEE-20-05-406.R1 A workflow for accurate metabarcoding using nanopore MinION sequencing

    Dear Dr Bilgenur Baloğlu,

    It is a pleasure to accept your manuscript entitled "A workflow for accurate metabarcoding using nanopore MinION sequencing" in its current form for publication in Methods in Ecology and Evolution. The comments of the reviewers who reviewed your manuscript are included below. Final instructions for your manuscript, and some promotion options, can be found at the end of this email.

    Thank you for your fine contribution. On behalf of all Editors of Methods in Ecology and Evolution, I look forward to your continued contributions to the Journal.

    Sincerely,

    Dr Lee Hsiang Liow
    Senior Editor, Methods in Ecology and Evolution

    Reply to:
    Ms India Stephenson
    Methods in Ecology and Evolution Editorial Office
    coordinator@methodsinecologyandevolution.org

    Why not become a member of the British Ecological Society? https://www.britishecologicalsociety.org/jointhebes

    Reviewer(s)' Comments to Author:
    Reviewer: 1

    Comments to the Corresponding Author
    Thank you for either implementing or addressing all of my suggested revisions and comments from my initial review, and those of Reviewer 2. The new ASHURE subsection in the Methods and the new Supplemental Materials and Methods improves both the clarity of the manuscript and the reproducibility of the workflow.

    Decision letter by
    Cite this decision letter
    Reviewer report
    2020/12/15

    Thank you for either implementing or addressing all of my suggested revisions and comments from my initial review, and those of Reviewer 2. The new ASHURE subsection in the Methods and the new Supplemental Materials and Methods improves both the clarity of the manuscript and the reproducibility of the workflow.

    Reviewed by
    Cite this review
    Author Response
    2020/10/17

    Comments to the reviewers

    We would like to thank Reviewer 1 and 2 for their valuable and constructive feedback. We addressed each comment and revised the manuscript accordingly. Following Reviewer 2’s suggestions, we generated a new subsection in the methods (ASHURE data processing workflow) to describe the bioinformatics steps in more detail. We also generated a new Supplemental Material and Methods section, where we provided more details on the parameters, statistical measures, and some of the code in ASHURE. We also added new supplementary figures to clarify the steps taken within the OPTICS clustering algorithm. Please find our responses below in italicized form.

    Reviewer 1

    Comments to the Corresponding Author
    I saw this work presented at the Barcode of Life conference last year and am pleased to see it submitted for publication, as it represents a significant step in overcoming the problem of high sequence error rate that has hindered the adoption of nanopore sequencing in metabarcoding studies.

    Specifically, the manuscript describes the use of a mock community of 50 specimens to optimize protocols and tools for using rolling circle amplification (RCA) to improve the accuracy of nanopore-generated consensus sequences. The comparison between Illumina and Nanopore platforms in detecting the components of the mock community is also valuable, especially for engendering confidence among current Illumina users who might be wary of the Nanopore platform.

    The creation of a novel Python pipeline for processing, consensus building, clustering, and ID of Nanopore-generated metabarcoding reads using RCA, is an equally valuable contribution, especially as it can also be used to process outputs of other isothermal amplification methods as recommended in the Discussion.

    While RCA had been previously used to reduce Nanopore sequencing error rate in other studies (line 68), it hadn’t yet been demonstrated in a metabarcoding study. And while nanopore sequencing has been previously used for metabarcoding (line 73), the cited studies applied it to much smaller species communities (fewer than 11 specimens), and, with the exception of Calus et al. (2017), did not employ RCA or other amplification protocols for error rate reduction.

    Attached are my specific, suggested revisions.

    Suggested revisions:
    • While the low cost of the MinION is indeed a desirable feature of the platform and one
    reason for its popularity (line 53-55), citing the low capital investment ($1000) is somewhat misleading as it leaves out the (far more expensive) consumables required. Consider citing a start-up cost that includes consumables. We changed this to start-up costs and clarified that packages start at $1000 with small amount of supplies included.
    • Line 110 (“…remaining tissue of the mock community specimens was dried overnight,
    pooled, and placed in sterile 20mL tubes” for bulk DNA extraction): More information is
    needed. Were equal weights of dried tissue used from each specimen? If so, how much, and
    with how much variation was there? If not, why not, and how would different amounts of
    tissue input affect the results?
    We did not weigh the specimens in this experiment (neither for MiSeq nor for the Nanopore study) and did not use equal weights of tissue. The mock samples contain a fair bit of size variation which is very much in line with a ‘normal’ bulk sample. Using samples with varying biomass could lead to bias in the detection but, given that in our case 49/50 and 100% of the species were detected using MiSeq and Nanopore sequencing, respectively, we suggest that this issue did not play such a large role.
    • Line 110: “Tubes” suggests more than one tube. Were multiple replicates done? If so, how
    many?
    Changed to tube. We had only one tube/one replicate for 50 specimens.
    • In the Illumina experiment, individual specimen DNAs (not the bulk DNA extract) were
    separately amplified, normalized, pooled, and sequenced. Add a sentence or two explaining
    why an Illumina experiment was not performed on the bulk sample, for direct comparison
    with the nanopore experiment.
    We actually used the same bulk DNA extract for the Illumina MiSeq experiment. We added a sentence to clarify this.
    • A 421bp region of CO1 was used for Illumina metabarcoding (line 121), while a 658bp
    region was used for Nanopore metabarcoding (line 151). Explain why you didn’t use the
    same fragment for the two studies, especially since they are meant to be comparative (line
    303-305). Presumably, it’s because Illumina does not do well with fragments longer than
    ~500bp. In that case, why wasn’t the shorter fragment used in the Nanopore experiment, to
    allow for direct comparison? Could the fact that a shorter fragment was used in the Illumina
    experiment explain why 49/50 OTUs were identified in comparison with 50/50 in the
    Nanopore experiment?
    Indeed, we could not use longer fragments for Illumina sequencing but at the same time we wanted to show the power of the Nanopore approach as it can sequence very long fragments. Given that we chose genetically fairly distant species for our mock community it is very unlikely that shorter sequence length could be the reason for not picking up one species. It might be primer bias as we were using different primer sets for both.
    • The trade-off between consensus sequence accuracy and the number of OTUs that can be identified from a heterogeneous sample is a key result of this paper. In the discussion (line
    296) it is acknowledged that this fact will become more relevant in studies where samples
    include species that are more closely related than this mock community. Consider giving
    some examples of such samples, and perhaps also exploring this more in terms of how it
    will affect different kinds of metabarcoding studies in different contexts.
    We addressed this in the discussion as most bulk samples are a mixture of species with a varying degree of genetic relatedness.
    • In studies where low error rate is of utmost important, are there recommendations you can
    make, such as size selection or longer/multiple nanopore sequencing runs to generate more
    sequence data from longer RCA fragments?
    Addressed in discussion.
    • Minor items:
    o Line 36 (“…the exploration of other isothermal amplification procedures to
    improve consensus length”): I think you might mean either consensus sequence
    accuracy or RCA fragment length, not consensus length.
    That is correct. We changed to consensus accuracy and removed consensus length.
    o Line 41: “quantify” is an inappropriate term as metabarcoding is, for the most
    part unsuitable for quantitative analyses. “Characterize” or “assess” might be
    better.
    Changed to ‘assess’
    o Line 63: italicize “a priori”
    done
    o Line 226: specify what “good DNA quality” means
    BB: I decided to remove the sentence as it was qualitative
    o Line 226: comma needed in “204 797”
    done
    o Line 229-230: I think there is a comma missing before “mostly contaminants”
    done
    o References: Multiple author lists are shortened with ellipses. Is this consistent
    with MEE’s bibliography format, or was it perhaps accidentally imported from a
    reference manager?
    Yes, it is consistent with MEE’s bibliography format. In case there are more than 6 authors, it is recommended to add ellipses followed by the last author.

    Reviewer 2

    Dear authors,
    Thank you for your work to develop a Rolling Circle Amplification based metabarcoding
    pipeline with the MinION. Overall, this is a useful contribution to the field as the ASHURE
    pipeline can be used for sequencing other types of concatenated sequence. Also, great to see
    the comparison with an existing pipeline for RCA products, C3POa. However, revisions will be
    needed so that readers can reproduce the protocol and results are more accurately described.
    Main comments:
    1) more detail in the methods section on the DNA preparation steps (see Minor comments)
    We have edited the methods section accordingly and provided some more comments below.
    2) more detail on the data processing steps in the Methods Section to explain Fig S1. The
    authors could follow the C3POa processing section, Methods section in Volden et al 2018,
    including a brief definition of a cluster center, parameters used in the analysis and why there
    can be multiple cluster centers. Also, a suggestion to include average accuracy.
    We added a new subsection in the Methods (ASHURE data processing workflow) describing the steps of the ASHURE pipeline. We also added a new section, Supplemental Material and Methods, where we provided more details on the statistical measures and parameters used in ASHURE. As for the average accuracy, our justification is that the median is more representative of the central distribution, as the average can be more influenced by outliers that are misclassified during mapping of the reads back to the reference database, and this is a known problem with highly errored nanopore reads. Therefore, we decided to report the median accuracy instead of the average accuracy.
    3) To re-write this line in the Abstract: “Our pipeline successfully identified all 50 species in the
    mock community and exhibited comparable sensitivity and accuracy to MiSeq.”
    It would be inaccurate to say that the ASHURE sensitivity and median accuracy is comparable
    to MiSeq accuracy” without qualifying that statement, and more appropriate to say that the
    pipeline can identify all species in the mock community but there is a tradeoff i.e. ~92% median
    accuracy. Also if possible to include that mock community species are at least 15% genetic
    distance to each other. To accommodate the word limit, possibly sections 1 and 2 in the
    Abstract could be made more succinct e.g. delete full-length since 658bp is already noted.
    We rephrased the abstract a little and removed the statement in question. We added the mock community distance information as suggested.
    4) There is no background on C3POa in the Introduction or explanation for a new pipeline.
    Some or all of lines 339 to 356 should be in the Introduction together with a few lines on why
    ASHURE was developed e.g. how is it different/better than C3POa.
    Added a few lines in the introduction as suggested
    Minor comments:
    Line 110: What was the dry weight of each specimen?
    We did not weigh dry specimens in this experiment.
    Line 124: How much DNA template was used in the PCR?
    12.5 ng DNA, added in the text.
    Line 149: How much DNA template was used in the PCR?
    12.5 ng DNA, added in the text.
    Line 155: quantified with a Qubit?
    Yes, we included this information.
    Lines 156 to 158: Amount of amplicons used in the ligation step? How much treated with
    DNAse? Units of DNAse used?
    Included in manuscript.
    Line 161: How much DNA template was used?
    Starting concentration was 0.3–0.4 ng/μL. Updated in manuscript.
    Line 163 to 165: this section is slightly confusing e.g. RCA duration between 2.5 hours and 5
    hours are not used in Protocols A and B but brought up in the text. Line 311 indicates you need
    at least 5 hrs for sufficient conc so if you want to include duration <5 hrs, it would be more
    useful and easier to understand if the following information was included as a table in the
    supplemental section: starting amt of amplicon and/or concentration and the time taken for
    concentration to get to 60-70ng/ul.
    During optimization, concentration was measured every hour, starting from 1 to 6 hours. The RCA products were only used for the next step if they reached 60-70 ng/ul, which corresponded to 5 to 6 hours of incubation for a starting DNA template concentration of 0.3-0.4 ng/ul. It means that products with RCA duration < 5 hours was not included in the experiment. Details were added to the manuscript. This should help avoiding confusion without another supplemental table.
    Line 171: Here it states 10 min compared to Table 1, which states 5 min, which one is correct?
    This has been corrected
    Line 176: It would be great if a picture of this gel was included.
    We unfortunately don’t have the gel image digitally, therefore we are not able to provide it.
    Line 182: why sheared genomic DNA, not RCA product?
    Corrected to “sheared RCA product”.
    Line 191: Each library was run on a different flowcell?
    Correct, reusing flow cells is not recommended as efficiency drops considerably. The two libraries here correspond to Protocol A and B.
    Line 196 to 197: The GitHub readme page was low on details for understanding if/what
    reference database was used and if this could be included in the Methods section. Also, what is
    N for iterations in analysis?
    We added a new subsection in the Methods (ASHURE data processing workflow) describing the steps in ASHURE pipeline. We also provided more details on statistical measures and parameters in the Supplemental Material and Methods. We also edited Fig S1 to explain N (it is a user defined value for the number of iterations during the OPTICS clustering step). Fig S1 is now called Fig S2.
    Line 205, 243: If possible, also the average accuracy.
    Our justification is that the median is more representative of the central distribution, as the average can be more influenced by outliers that are misclassified during mapping of the reads back to the reference database, and this is a known problem with highly errored nanopore reads. Therefore, we decided to report the median accuracy instead of the average accuracy.
    Paragraph starting at line 206 and Table 2: In Volden et al 2018, there is a pre-processing step for raw reads, are the same parameters used in this paper?
    Yes, we also preprocessed the base-called raw reads to remove short (<1,000 kb) and low-quality (Q < 9) reads. Initially, we provided this information in the discussion, and now it can also be found in the supplemental materials and methods section.
    - Also, “unfiltered” read count is different between each pipeline for the same protocol,
    some explanation? e.g.raw reads are processed differently for each pipeline /
    preprocessing for C3POa analysis to remove short (<1,000 kb) and low-quality (Q < 9)
    reads.
    Very good suggestions. We provided more context and explanation in the discussion.
    - There’s no mention in the Methods section how the cluster center is initialized for this
    study. Fig S1 says with random sequences or provided by user, which one was it?
    The cluster centers were initialized from random sequences drawn from the set of sequences being clustered. We updated it in Fig S1 and provided more detailed explanation in the Supplemental Materials and Methods. Fig S1 now became Fig S2.
    - This link from the Github page is not working: clustering.ipynb .
    We updated the link: https://github.com/BBaloglu/ASHURE/blob/master/demo/clustering.ipynb
    We also provided a very detailed explanation in the Supplementary Materials and Methods (see Section S1.5) on the OPTICS clustering algorithm that we modified and developed. We generated new supplementary figures (Supplementary Figures 3 to 6) to explain the cluster centers obtained with OPTICS.
    Line 242 to 243: Why is “flexible filtering RCA>1” used in the text and “post filtering non-target
    data based on MiSeq” in Table 2? It appears they mean the same thing, some clarification
    and/or consistency in wording for the reader.
    Agreed, these two terms are being used interchangeably. We have added in parentheses in Table 2 that these two terms correspond to the same thing.
    Line 294 to 296: With strict filtering ASHURE is not able to recover all the species in the mock
    community. Please qualify or re-write the sentence.
    Removed ‘and strict’ and only kept flexible filtering.
    Line 301: According to this recommendation, you would miss identifying 25% of species.
    Overall C3POa with protocol A seems to have the best results. Some justification of why
    Protocol B over C3POa for metabarcoding e.g. useful for certain type of surveys, stats etc.
    We made it clear that there is a trade off in our approach and we suggest that the two protocols are roughly equivalent for initial accuracy values, without further optimization. The R2C2 study (Volden et al. 2018) relies on splint sequences and Gibson assembly. Gibson assembly is another tedious step in the molecular workflow, therefore we decided to avoid that. Instead, our pipeline allows for searching for the concatemers by using primers/reference sequences. Due to the different methodologies between the two algorithms, we were not able to improve the accuracy values more than 94.5% using C3POa (since C3POa does not report information on the RCA fragment length), compared to more than 99% with ASHURE. In our study, we suggest that for studies where a low error rate is of utmost importance, the proportion of longer reads needs to be maximized by tweaking various steps in the molecular workflow, i.e., through size selection, RCA duration increase, multiple nanopore runs. Protocol B simply provided more of the longer fragments compared to Protocol A, using ASHURE.
    Line 370: “By offering portable, highly accurate, and species-level metabarcoding, Nanopore
    sequencing presents a promising and flexible alternative for future bioassessment programs
    and it appears that we have reached a point where highly accurate and potentially field-based
    DNA metabarcoding with this instrument is possible.” This sentence needs to be re-written since obtaining highly accurate consensus sequences is associated with about a third of species in the community not identified by the pipeline.
    We updated the text accordingly
    Fig 1C and 1D would be more useful for comparison if scaled more similarly on the y-axis.
    We updated the plot legend to eliminate confusion. We opted out of using the same scale, because the resolution of accuracy differences between each haplotype was poor for Figure 1D if it was the same scale as 1C.
    Supplementary Table S3: An explanation of similarity% high vs low to include in table
    description
    Edited



    Cite this author response
  • pre-publication peer review (ROUND 1)
    Decision Letter
    2020/08/13

    13-Aug-2020

    MEE-20-05-406 A workflow for accurate metabarcoding using nanopore MinION sequencing

    Dear Dr Bilgenur Baloğlu,

    Thank you for submitting your manuscript to Methods in Ecology and Evolution. I have now received the reviewers' reports and a recommendation from the Associate Editor who handled the review process. Copies of their reports are included below. This manuscript has the potential to make a valuable contribution to the area, but there are a number of significant concerns that need to be addressed. I have considered your paper in light of the comments received and I would like to invite you to prepare a major revision.

    Please ensure that code associated with your ms is well-commented upon for the benefit of potential users.

    In your revision, please make sure that you take full account of my comment above and comments made in the reports below. Please note that Methods in Ecology and Evolution does not automatically accept papers after revision, and an invitation to revise a manuscript does not represent commitment to eventual publication on our part. We will reject revised manuscripts if they are returned without satisfactory responses to the reviewers' comments. When returning the revised paper, please show point-by-point how you have dealt with the various comments in the appropriate section of the submission form.

    Please return your revision by 24-Sep-2020. If you need longer, please let us know so we can update our system accordingly. Before resubmitting your manuscript, please read through the resubmission instructions below.

    We look forward to hearing from you in due course.

    Sincerely,

    Dr Lee Hsiang Liow
    Senior Editor, Methods in Ecology and Evolution

    Reply to:
    Ms India Stephenson
    Methods in Ecology and Evolution Editorial Office
    coordinator@methodsinecologyandevolution.org

    Associate Editor Comments to Author:
    Associate Editor
    Comments to the Author:
    This paper presents a workflow for using rolling circle amplification for improving the error rate for metabarcoding studies, as such it could aid researchers in this area. Both reviewers provide useful comments and improvements that need to be considered.

    Reviewer(s)' Comments to Author:
    Reviewer: 1

    Comments to the Corresponding Author
    I saw this work presented at the Barcode of Life conference last year and am pleased to see it submitted for publication, as it represents a significant step in overcoming the problem of high sequence error rate that has hindered the adoption of nanopore sequencing in metabarcoding studies.

    Specifically, the manuscript describes the use of a mock community of 50 specimens to optimize protocols and tools for using rolling circle amplification (RCA) to improve the accuracy of nanopore-generated consensus sequences. The comparison between Illumina and Nanopore platforms in detecting the components of the mock community is also valuable, especially for engendering confidence among current Illumina users who might be wary of the Nanopore platform.

    The creation of a novel Python pipeline for processing, consensus building, clustering, and ID of Nanopore-generated metabarcoding reads using RCA, is an equally valuable contribution, especially as it can also be used to process outputs of other isothermal amplification methods as recommended in the Discussion.

    While RCA had been previously used to reduce Nanopore sequencing error rate in other studies (line 68), it hadn’t yet been demonstrated in a metabarcoding study. And while nanopore sequencing has been previously used for metabarcoding (line 73), the cited studies applied it to much smaller species communities (fewer than 11 specimens), and, with the exception of Calus et al. (2017), did not employ RCA or other amplification protocols for error rate reduction.

    Attached are my specific, suggested revisions.

    Reviewer: 2

    Comments to the Corresponding Author
    Dear authors,
    Thank you for your work to develop a Rolling Circle Amplification based metabarcoding pipeline with the MinION. Overall, this is a useful contribution to the field as the ASHURE pipeline can be used for sequencing other types of concatenated sequence. Also, great to see the comparison with an existing pipeline for RCA products, C3POa. However, revisions will be needed so that readers can reproduce the protocol and results are more accurately described.
    Main comments:
    1) more detail in the methods section on the DNA preparation steps (see Minor comments)
    2) more detail on the data processing steps in the Methods Section to explain Fig S1. The
    authors could follow the C3POa processing section, Methods section in Volden et al 2018,
    including a brief definition of a cluster center, parameters used in the analysis and why there
    can be multiple cluster centers.
    Also, a suggestion to include average accuracy.
    3) To re-write this line in the Abstract: “Our pipeline successfully identified all 50 species in the
    mock community and exhibited comparable sensitivity and accuracy to MiSeq.”
    It would be inaccurate to say that the ASHURE sensitivity and median accuracy is comparable
    to MiSeq accuracy” without qualifying that statement, and more appropriate to say that the
    pipeline can identify all species in the mock community but there is a tradeoff i.e. ~92% median
    accuracy. Also if possible to include that mock community species are at least 15% genetic
    distance to each other. To accommodate the word limit, possibly sections 1 and 2 in the
    Abstract could be made more succinct e.g. delete full-length since 658bp is already noted.
    4) There is no background on C3POa in the Introduction or explanation for a new pipeline.
    Some or all of lines 339 to 356 should be in the Introduction together with a few lines on why
    ASHURE was developed e.g. how is it different/better than C3POa.
    Minor comments:
    Line 110: What was the dry weight of each specimen?
    Line 124: How much DNA template was used in the PCR?
    Line 149: How much DNA template was used in the PCR?
    Line 155: quantified with a Qubit?
    Lines 156 to 158: Amount of amplicons used in the ligation step? How much treated with
    DNAse? Units of DNAse used?
    Line 161: How much DNA template was used?
    Line 163 to 165: this section is slightly confusing e.g. RCA duration between 2.5 hours and 5
    hours are not used in Protocols A and B but brought up in the text. Line 311 indicates you need
    at least 5 hrs for sufficient conc so if you want to include duration <5 hrs, it would be more
    useful and easier to understand if the following information was included as a table in the
    supplemental section: starting amt of amplicon and/or concentration and the time taken for
    concentration to get to 60-70ng/ul.
    Line 171: Here it states 10 min compared to Table 1, which states 5 min, which one is correct?
    Line 176: It would be great if a picture of this gel was included.
    Line 182: why sheared genomic DNA, not RCA product?
    Line 191: Each library was run on a different flowcell?
    Line 196 to 197: The GittHub readme page was low on details for understanding if/what
    reference database was used and if this could be included in the Methods section. Also, what is
    N for iterations in analysis?
    Line 205, 243: If possible, also the average accuracy.
    Paragraph starting at line 206 and Table 2:
    - In Volden et al 2018, there is a pre-processing step for raw reads, are the same
    parameters used in this paper?
    - Also, “unfiltered” read count is different between each pipeline for the same protocol,
    some explanation? e.g.raw reads are processed differently for each pipeline /
    preprocessing for C3POa analysis to remove short (<1,000 kb) and low-quality (Q < 9)
    reads.
    - There’s no mention in the Methods section how the cluster center is initialized for this
    study. Fig S1 says with random sequences or provided by user, which one was it?
    - This link from the Github page is not working: clustering.ipynb .
    Line 242 to 243: Why is “flexible filtering RCA>1” used in the text and “post filtering non-target
    data based on MiSeq” in Table 2? It appears they mean the same thing, some clarification
    and/or consistency in wording for the reader..
    Line 294 to 296: With strict filtering ASHURE is not able to recover all the species in the mock
    community. Please qualify or re-write the sentence. .
    Line 301: According to this recommendation, you would miss identifying 25% of species.
    Overall C3POa with protocol A seems to have the best results. Some justification of why
    Protocol B over C3POa for metabarcoding e.g. useful for certain type of surveys, stats etc.
    Line 370: “By offering portable, highly accurate, and species-level metabarcoding, Nanopore
    sequencing presents a promising and flexible alternative for future bioassessment programs
    and it appears that we have reached a point where highly accurate and potentially field-based
    DNA metabarcoding with this instrument is possible.”
    This sentence needs to be re-written since obtaining highly accurate consensus sequences is
    associated with about a third of species in the community not identified by the pipeline.
    Fig 1C and 1D would be more useful for comparison if scaled more similarly on the y-axis.
    Supplementary Table S3: An explanation of similarity% high vs low to include in table
    description

    Decision letter by
    Cite this decision letter
    Reviewer report
    2020/07/27

    Dear authors,
    Thank you for your work to develop a Rolling Circle Amplification based metabarcoding pipeline with the MinION. Overall, this is a useful contribution to the field as the ASHURE pipeline can be used for sequencing other types of concatenated sequence. Also, great to see the comparison with an existing pipeline for RCA products, C3POa. However, revisions will be needed so that readers can reproduce the protocol and results are more accurately described.
    Main comments:
    1) more detail in the methods section on the DNA preparation steps (see Minor comments)
    2) more detail on the data processing steps in the Methods Section to explain Fig S1. The
    authors could follow the C3POa processing section, Methods section in Volden et al 2018,
    including a brief definition of a cluster center, parameters used in the analysis and why there
    can be multiple cluster centers.
    Also, a suggestion to include average accuracy.
    3) To re-write this line in the Abstract: “Our pipeline successfully identified all 50 species in the
    mock community and exhibited comparable sensitivity and accuracy to MiSeq.”
    It would be inaccurate to say that the ASHURE sensitivity and median accuracy is comparable
    to MiSeq accuracy” without qualifying that statement, and more appropriate to say that the
    pipeline can identify all species in the mock community but there is a tradeoff i.e. ~92% median
    accuracy. Also if possible to include that mock community species are at least 15% genetic
    distance to each other. To accommodate the word limit, possibly sections 1 and 2 in the
    Abstract could be made more succinct e.g. delete full-length since 658bp is already noted.
    4) There is no background on C3POa in the Introduction or explanation for a new pipeline.
    Some or all of lines 339 to 356 should be in the Introduction together with a few lines on why
    ASHURE was developed e.g. how is it different/better than C3POa.
    Minor comments:
    Line 110: What was the dry weight of each specimen?
    Line 124: How much DNA template was used in the PCR?
    Line 149: How much DNA template was used in the PCR?
    Line 155: quantified with a Qubit?
    Lines 156 to 158: Amount of amplicons used in the ligation step? How much treated with
    DNAse? Units of DNAse used?
    Line 161: How much DNA template was used?
    Line 163 to 165: this section is slightly confusing e.g. RCA duration between 2.5 hours and 5
    hours are not used in Protocols A and B but brought up in the text. Line 311 indicates you need
    at least 5 hrs for sufficient conc so if you want to include duration <5 hrs, it would be more
    useful and easier to understand if the following information was included as a table in the
    supplemental section: starting amt of amplicon and/or concentration and the time taken for
    concentration to get to 60-70ng/ul.
    Line 171: Here it states 10 min compared to Table 1, which states 5 min, which one is correct?
    Line 176: It would be great if a picture of this gel was included.
    Line 182: why sheared genomic DNA, not RCA product?
    Line 191: Each library was run on a different flowcell?
    Line 196 to 197: The GittHub readme page was low on details for understanding if/what
    reference database was used and if this could be included in the Methods section. Also, what is
    N for iterations in analysis?
    Line 205, 243: If possible, also the average accuracy.
    Paragraph starting at line 206 and Table 2:
    - In Volden et al 2018, there is a pre-processing step for raw reads, are the same
    parameters used in this paper?
    - Also, “unfiltered” read count is different between each pipeline for the same protocol,
    some explanation? e.g.raw reads are processed differently for each pipeline /
    preprocessing for C3POa analysis to remove short (<1,000 kb) and low-quality (Q < 9)
    reads.
    - There’s no mention in the Methods section how the cluster center is initialized for this
    study. Fig S1 says with random sequences or provided by user, which one was it?
    - This link from the Github page is not working: clustering.ipynb .
    Line 242 to 243: Why is “flexible filtering RCA>1” used in the text and “post filtering non-target
    data based on MiSeq” in Table 2? It appears they mean the same thing, some clarification
    and/or consistency in wording for the reader..
    Line 294 to 296: With strict filtering ASHURE is not able to recover all the species in the mock
    community. Please qualify or re-write the sentence. .
    Line 301: According to this recommendation, you would miss identifying 25% of species.
    Overall C3POa with protocol A seems to have the best results. Some justification of why
    Protocol B over C3POa for metabarcoding e.g. useful for certain type of surveys, stats etc.
    Line 370: “By offering portable, highly accurate, and species-level metabarcoding, Nanopore
    sequencing presents a promising and flexible alternative for future bioassessment programs
    and it appears that we have reached a point where highly accurate and potentially field-based
    DNA metabarcoding with this instrument is possible.”
    This sentence needs to be re-written since obtaining highly accurate consensus sequences is
    associated with about a third of species in the community not identified by the pipeline.
    Fig 1C and 1D would be more useful for comparison if scaled more similarly on the y-axis.
    Supplementary Table S3: An explanation of similarity% high vs low to include in table
    description

    Reviewed by
    Cite this review
    Reviewer report
    2020/07/16

    I saw this work presented at the Barcode of Life conference last year and am pleased to see it submitted for publication, as it represents a significant step in overcoming the problem of high sequence error rate that has hindered the adoption of nanopore sequencing in metabarcoding studies.

    Specifically, the manuscript describes the use of a mock community of 50 specimens to optimize protocols and tools for using rolling circle amplification (RCA) to improve the accuracy of nanopore-generated consensus sequences. The comparison between Illumina and Nanopore platforms in detecting the components of the mock community is also valuable, especially for engendering confidence among current Illumina users who might be wary of the Nanopore platform.

    The creation of a novel Python pipeline for processing, consensus building, clustering, and ID of Nanopore-generated metabarcoding reads using RCA, is an equally valuable contribution, especially as it can also be used to process outputs of other isothermal amplification methods as recommended in the Discussion.

    While RCA had been previously used to reduce Nanopore sequencing error rate in other studies (line 68), it hadn’t yet been demonstrated in a metabarcoding study. And while nanopore sequencing has been previously used for metabarcoding (line 73), the cited studies applied it to much smaller species communities (fewer than 11 specimens), and, with the exception of Calus et al. (2017), did not employ RCA or other amplification protocols for error rate reduction.

    Attached are my specific, suggested revisions.

    Reviewed by
    Cite this review
All peer review content displayed here is covered by a Creative Commons CC BY 4.0 license.