Content of review 1, reviewed on November 02, 2020

The authors have sufficiently addressed the concerns we raised earlier, and in our opinion, the revised manuscript has been improved in the following aspects.

  1. New automated method for estimating background contamination - The authors have now included an improved version, which is supposed to be more robust and rely on fewer assumptions than the one described in the first version of the manuscript (and was removed it the second version). This new approach focuses on the identification of strong marker genes of particular cell types/clusters (i.e. highly expressed genes that are very specific to the clusters), which can be used to estimate the "contamination" and hence correct the randomly distributed expression.

While this concept is simple and appears to be very effective, we wonder if the results might be dependent on the clustering methods and parameters (e.g. resolution)? Could the authors please comment on this? Saying that, we did test the new algorithm (v1.4.8) on one of our data sets (human PBMC), but did not observe much of differences when the resolutions were varied between 0.8 - 5.0 using Seurat's SCT. We did noticed though, that when we specified user-defined soup genes (which require prior knowledge on the biological system), SoupX still gave a better result than the automated approach.

  1. Additional data set - To further showcase SoupX's utility on different types of samples, the authors have now included another data set, containing 40 more "channels" of fetal liver haematopoiesis samples. As the authors explained, even though the data were from one study, it did contain the cells populations from different biological context during the development. Saying that, we think it would still be useful if the authors could continue to add the data from different tissues, possibly as future vignettes.

  2. The authors have now appropriately discussed major differences, pros and cons of other methods that remove ambient RNA.

In conclusion, as discussed earlier, we believe SoupX is a unique and useful tool for the single-cell biology community, and the revised manuscript is suitable for publication in GigaScience.

(Please note that the manuscript was co-reviewed by Ms Jantarika Arora and Mr Tiraput Poonpanichakul)

Declaration of competing interests Please complete a declaration of competing interests, considering the following questions: Have you in the past five years received reimbursements, fees, funding, or salary from an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold any stocks or shares in an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold or are you currently applying for any patents relating to the content of the manuscript? Have you received reimbursements, fees, funding, or salary from an organization that holds or has applied for patents relating to the content of the manuscript? Do you have any other financial competing interests? Do you have any non-financial competing interests in relation to this paper? If you can answer no to all of the above, write 'I declare that I have no competing interests' below. If your reply is yes to any, please give details below. I declare that I have no competing interests.

I agree to the open peer review policy of the journal. I understand that my name will be included on my report to the authors and, if the manuscript is accepted for publication, my named report including any attachments I upload will be posted on the website along with the authors' responses. I agree for my report to be made available under an Open Access Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments which I do not wish to be included in my named report can be included as confidential comments to the editors, which will not be published. I agree to the open peer review policy of the journal.

Authors' response to reviews: (https://drive.google.com/file/d/1uetFEGQaDqNd909YP4T_hFs47jZ55pCW/view?usp=sharing)

Source

    © 2020 the Reviewer (CC BY 4.0).

References

    D., Y. M., Sam, B. 2020. SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. GigaScience.