Content of review 1, reviewed on February 22, 2019

The authors illustrate a well known but important problem in droplet-based scRNAseq experiments called soup or ambient contamination, where ambient or free-floating mRNA can enter the oil droplet along with a single cell. The authors provide a convincing demonstration of the problem by re-analyzing a barnyard mixed-species dataset. They also provide some nice solutions for detecting soup contamination present in single cell droplets (by making list of known lists of marker genes) and provide computational methods with easy-to-use code for removing such soup-contamination. Major comments: The overall ideas in the paper are clear and are well described. However, I found the paper considerably lacking in detail, in terms of figures and explanation. The text is exceptionally short and the figure captions and legends are so small as to be almost imperceptible. In addition, too much important material is relegated to the supplemental figures. The other key area where the paper is lacking is a more systematic assessment of more datasets. This is essential for readers to better understand when soup contamination is likely to be a huge problem vs where it is a mere nuisance. Specifically, the authors should evaluate the presence of soup contamination in at least 3-4 additional datasets, including some high quality dropseq data from model organisms, such as freshly dissociated mouse cells, as well as lower quality datasets from human clinical samples, such as post-mortem human brain datasets.

Minor comments: 1. I especially applaud the authors for making a manuscript preprint available on biorxiv and code available on GitHub. I had actually read the paper as a preprint and had downloaded and played with the code on scRNAseq datasets that I was working with ~6 months ago. 2. However, at the time, one challenge I struggled with was finding good markers for genes. In the context of brain cell types, the authors could cite the NeuroExpresso.org resource (http://neuroexpresso.org/). B. Ogan Mancarci et al., "Cross-Laboratory Analysis of Brain Cell Type Transcriptomes with Applications to Interpretation of Bulk Tissue Data," ENeuro, November 20, 2017, ENEURO.0212-17.2017, https://doi.org/10.1523/ENEURO.0212-17.2017. 3. Similarly, some of the ideas in the paper regarding the use of marker genes for detecting contamination are similar to those presented for a similar quality control analysis for Patch-Seq data by Tripathy et al, 2018: https://www.frontiersin.org/articles/10.3389/fnmol.2018.00363/full.

Source

    © 2019 the Reviewer (CC BY 4.0).

Content of review 2, reviewed on February 18, 2020

Thanks to the authors for substantially revising their paper in response to my prior comments. The paper is considerably improved. See minor comments below. Minor comments: 1. I would encourage the authors to have a non-computational domain expert / non cancer biology to give a careful read of the paper. For example, I noticed some abbreviations in the paper not being spelled out (e.g., it took me a while to figure out that the term IG referred to immunoglobin genes). Same for HG, MNP, etc. The method is broadly useful across many domains of biology and it is essential that the method not overly contain domain-specific terms. 2. Legend in Fig 2a is obscured. 3. Please consider the online description of the method and github repo including a vignette that more directly interfaces with a typical Seurat workflow. This will considerably enhance the usability of the method.

Declaration of competing interests Please complete a declaration of competing interests, considering the following questions: Have you in the past five years received reimbursements, fees, funding, or salary from an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold any stocks or shares in an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold or are you currently applying for any patents relating to the content of the manuscript? Have you received reimbursements, fees, funding, or salary from an organization that holds or has applied for patents relating to the content of the manuscript? Do you have any other financial competing interests? Do you have any non-financial competing interests in relation to this paper? If you can answer no to all of the above, write 'I declare that I have no competing interests' below. If your reply is yes to any, please give details below. I declare that I have no competing interests.

I agree to the open peer review policy of the journal. I understand that my name will be included on my report to the authors and, if the manuscript is accepted for publication, my named report including any attachments I upload will be posted on the website along with the authors' responses. I agree for my report to be made available under an Open Access Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments which I do not wish to be included in my named report can be included as confidential comments to the editors, which will not be published. I agree to the open peer review policy of the journal.

Authors' response to reviews: (https://drive.google.com/file/d/1DE8hxxx91vHiaQ9wVroMabuLlzeZzvly/view?usp=sharing)

Source

    © 2020 the Reviewer (CC BY 4.0).

References

    D., Y. M., Sam, B. 2020. SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. GigaScience.