Abstract

Citation indices are tools used by the academic community for research and research evaluation which aggregate scientific literature output and measure scientific impact by collating citation counts. Citation indices help measure the interconnections between scientific papers but fall short because they fail to communicate contextual information about why a citation was made. The usage of citations in research evaluation without due consideration to context can be problematic, if only because a citation that disputes a paper is treated the same as a citation that supports it. To solve this problem, we have used machine learning and other techniques to develop a “smart citation index” called scite, which categorizes citations based on context. Scite shows how a citation was used by displaying the surrounding textual context from the citing paper, and a classification from our deep learning model that indicates whether the statement provides supporting or disputing evidence for a referenced work, or simply mentions it. Scite has been developed by analyzing over 23 million full-text scientific articles and currently has a database of more than 800 million classified citation statements. Here we describe how scite works and how it can be used to further research and research evaluation.


Authors

Josh M. Nicholson;  Milo Mordaunt;  Patrice Lopez;  Ashish Uppala;  Dominic Rosati;  Neves P. Rodrigues;  Peter Grabitz;  Sean C. Rife

Publons users who've claimed - I am an author

No Publons users have claimed this paper.

Contributors on Publons
  • 2 reviewers
  • pre-publication peer review (FINAL ROUND)
    Decision Letter
    2021/06/29

    29-Jun-2021

    Dear Dr. Nicholson:

    It is a pleasure to accept your manuscript entitled "scite: a smart citation index that displays the context of citations and classifies their intent using deep learning" for publication in Quantitative Science Studies.

    I would like to request you to prepare the final version of your manuscript using the checklist available at https://tinyurl.com/qsschecklist. Please also sign the publication agreement, which can be downloaded from https://tinyurl.com/qssagreement. The final version of your manuscript, along with the completed checklist and the signed publication agreement, can be returned to qss@issi-society.org.

    Thank you for your contribution. On behalf of the Editors of Quantitative Science Studies, I look forward to your continued contributions to the journal.

    Best wishes,
    Dr. Ludo Waltman
    Editor, Quantitative Science Studies
    qss@issi-society.org

    Decision letter by
    Cite this decision letter
    Author Response
    2021/06/29

    Dear Dr. Waltman,

    Thank you for your letter and for the wonderful news! We have made further minor revisions based on reviewer 1's comments and believe that all points have now been sufficiently addressed, including the two main points of criticism as well as the minor and very minor points.

    We look forward to seeing the article published in QSS.

    Cordially,
    Josh



    Cite this author response
  • pre-publication peer review (ROUND 2)
    Decision Letter
    2021/06/28

    28-Jun-2021

    Dear Dr. Nicholson:

    Your manuscript QSS-2021-0020.R1 entitled "scite: a smart citation index that displays the context of citations and classifies their intent using deep learning", which you submitted to Quantitative Science Studies, has been reviewed. The comments of the reviewers are included at the bottom of this letter.

    Reviewers 2 and 3 recommend acceptance of your manuscript, while reviewer 1 still has a few small comments on your work. Based on the comments of the reviewers, I am pleased to let you know that your manuscript can almost be accepted for publication in Quantitative Science Studies. You manuscript will be accepted when the remaining comments of reviewer 1 have been addressed.

    To revise your manuscript, log into https://mc.manuscriptcentral.com/qss and enter your Author Center, where you will find your manuscript title listed under "Manuscripts with Decisions." Under "Actions," click on "Create a Revision." Your manuscript number has been appended to denote a revision.

    You may also click the below link to start the revision process (or continue the process if you have already started your revision) for your manuscript. If you use the below link you will not be required to login to ScholarOne Manuscripts.

    PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm.

    https://mc.manuscriptcentral.com/qss?URL_MASK=ea70847e8e0946c8b5234885e9339fe3

    You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript using a word processing program and save it on your computer. Please also highlight the changes to your manuscript within the document by using the track changes mode in MS Word or by using bold or colored text.

    Once the revised manuscript is prepared, you can upload it and submit it through your Author Center.

    When submitting your revised manuscript, you will be able to respond to the comments made by the reviewers in the space provided. You can use this space to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the reviewers.

    IMPORTANT: Your original files are available to you when you upload your revised manuscript. Please delete any redundant files before completing the submission.

    If possible, please try to submit your revised manuscript by 27-Aug-2021. Let me know if you need more time to revise your work.

    Once again, thank you for submitting your manuscript to Quantitative Science Studies and I look forward to receiving your revision.

    Best wishes,
    Dr. Ludo Waltman
    Editor, Quantitative Science Studies
    qss@issi-society.org

    Reviewers' Comments to Author:

    Reviewer: 1

    Comments to the Author
    Thanks to the authors for improving their manuscript based on the previous round of reviews. I particularly appreciate Table 1. (Consider including a version of this table as well in scite FAQ/help materials.)

    I have just two points of substantive criticism, fundamentally the paper is ready to be accepted with some minor editing:

    1) The information on page 5 should be checked, and possibly updated:
    "editorial information from Crossref and Pubmed such as corrections and whether the article has been retracted"
    Retraction Watch should be mentioned as a current or past source of scite's retraction data. For instance, per https://doi.org/10.7490/f1000research.1118546.1
    "scite... checks bibliographies against data from Crossref and Retraction Watch during submission and review...Josh Nicholson of scite notes that Retraction Watch data will be removed in September 2021 due to licensing restrictions; he expects Crossref and Pubmed to be the predominant data sources going forward."

    2) Cite the QSS paper on I4OC (or similar):
    OpenCitations, an infrastructure organization for open scholarship
    https://doi.org/10.1162/qss_a_00023
    Academics get credit for citations, and in addition to referencing the website it is appropriate to reference their work (especially their work in the same journal).

    MINOR COMMENTS BELOW
    Page numbers from the manuscript submission system header (e.g. Page X of 49):

    page 31: The reference appears in the bibliography - but should also be here parenthetically. (And, review comments throughout - .)

    Reference to https://arxiv.org/abs/1802.05365 indicates NAACL 2018
    Official version:
    https://www.aclweb.org/anthology/N18-1202/

    Not sure that online first articles are described as "forthcoming".
    https://doi.org/10.1111/febs.15608
    FEBS calls this "Early View
    Online Version of Record before inclusion in an issue"

    On Garfield 1964: Keep the bibliographic information from https://www.garfield.library.upenn.edu/essays/V1p084y1962-73.pdf
    Reprinted from: Mary Elizabeth Stevens, Vincent E. Giuliano, and Laurence B. Heilprin, Eds., Statistical Association Methods for Mechanized Documentation, Symposium Proceedings, Washington 1964. (National Bureau of Standards Miscellaneous Publication 269, December 15, 1965), pp. 189-192.

    It seems useful to keep the Rosati citation - this is closely related work of your team.

    VERY MINOR COMMENTS:

    page 3 - reference says "(Eugene Garfield, 1959)" - Garfield expected (see also page 5, where the "Eugene" was removed from the reference)
    page 4 - Better to keep "Traditional" to be more clear in my opinion.
    page 5, 9 - capitalize PubMed
    page 7 & throughout - machine learning doesn't need capitalization (even if you introduce an acronym for it). Use the acronym consistently throughout, once you introduce it, if you do introduce it.
    page 17 - Inter-Annotator-Agreement -> Inter-annotator agreement or Inter-Annotator Agreement
    page 18 & throughout - don't capitalize Deep Learning
    page 18 - f-score is F-score elsewhere.
    page 22 - "this information" -> "section titles"
    Figure 2 - Are there other steps that should be mentioned? For instance, this does not have additional data sources (e.g. PubMed). Ok however you decide - but I wanted to flag this.
    page 34 - Constantin - you may want to add that this is a PhD thesis.
    page 37 - Porter reference is indented (weird formatting)

    Reviewer: 2

    Comments to the Author
    The authors have addressed all my concerns. I have no other comments.

    Reviewer: 3

    Comments to the Author
    (There are no comments.)

    Decision letter by
    Cite this decision letter
    Reviewer report
    2021/06/28

    The authors have addressed all my concerns. I have no other comments.

    Reviewed by
    Cite this review
    Reviewer report
    2021/06/06

    Thanks to the authors for improving their manuscript based on the previous round of reviews. I particularly appreciate Table 1. (Consider including a version of this table as well in scite FAQ/help materials.)

    I have just two points of substantive criticism, fundamentally the paper is ready to be accepted with some minor editing:

    1) The information on page 5 should be checked, and possibly updated:
    "editorial information from Crossref and Pubmed such as corrections and whether the article has been retracted"
    Retraction Watch should be mentioned as a current or past source of scite's retraction data. For instance, per https://doi.org/10.7490/f1000research.1118546.1
    "scite... checks bibliographies against data from Crossref and Retraction Watch during submission and review...Josh Nicholson of scite notes that Retraction Watch data will be removed in September 2021 due to licensing restrictions; he expects Crossref and Pubmed to be the predominant data sources going forward."

    2) Cite the QSS paper on I4OC (or similar):
    OpenCitations, an infrastructure organization for open scholarship
    https://doi.org/10.1162/qss_a_00023
    Academics get credit for citations, and in addition to referencing the website it is appropriate to reference their work (especially their work in the same journal).

    MINOR COMMENTS BELOW
    Page numbers from the manuscript submission system header (e.g. Page X of 49):

    page 31: The reference appears in the bibliography - but should also be here parenthetically. (And, review comments throughout - .)

    Reference to https://arxiv.org/abs/1802.05365 indicates NAACL 2018
    Official version:
    https://www.aclweb.org/anthology/N18-1202/

    Not sure that online first articles are described as "forthcoming".
    https://doi.org/10.1111/febs.15608
    FEBS calls this "Early View
    Online Version of Record before inclusion in an issue"

    On Garfield 1964: Keep the bibliographic information from https://www.garfield.library.upenn.edu/essays/V1p084y1962-73.pdf
    Reprinted from: Mary Elizabeth Stevens, Vincent E. Giuliano, and Laurence B. Heilprin, Eds., Statistical Association Methods for Mechanized Documentation, Symposium Proceedings, Washington 1964. (National Bureau of Standards Miscellaneous Publication 269, December 15, 1965), pp. 189-192.

    It seems useful to keep the Rosati citation - this is closely related work of your team.

    VERY MINOR COMMENTS:

    page 3 - reference says "(Eugene Garfield, 1959)" - Garfield expected (see also page 5, where the "Eugene" was removed from the reference)
    page 4 - Better to keep "Traditional" to be more clear in my opinion.
    page 5, 9 - capitalize PubMed
    page 7 & throughout - machine learning doesn't need capitalization (even if you introduce an acronym for it). Use the acronym consistently throughout, once you introduce it, if you do introduce it.
    page 17 - Inter-Annotator-Agreement -> Inter-annotator agreement or Inter-Annotator Agreement
    page 18 & throughout - don't capitalize Deep Learning
    page 18 - f-score is F-score elsewhere.
    page 22 - "this information" -> "section titles"
    Figure 2 - Are there other steps that should be mentioned? For instance, this does not have additional data sources (e.g. PubMed). Ok however you decide - but I wanted to flag this.
    page 34 - Constantin - you may want to add that this is a PhD thesis.
    page 37 - Porter reference is indented (weird formatting)

    Reviewed by
    Cite this review
    Author Response
    2021/05/25

    Ludo Waltman, Ph.D.
    Editor-in-Chief, Quantitative Science Studies

    Dear Dr. Waltman,

    We would like to thank you and the reviewers for their highly engaged comments. We have revised the paper and in the table, in the accompanying letter, we respond to each point made.

    Overall, we have mostly made clarifying comments and added extra explanations to make the manuscript more clear.

    Thanks,
    Josh Nicholson



    Cite this author response
  • pre-publication peer review (ROUND 1)
    Decision Letter
    2021/04/22

    22-Apr-2021

    Dear Dr. Nicholson:

    Your manuscript QSS-2021-0020 entitled "scite: a smart citation index that displays the context of citations and classifies their intent using deep learning", which you submitted to Quantitative Science Studies, has been reviewed. The comments of the reviewers are included at the bottom of this letter.

    The three reviewers are all positive about your work. The reviewers have provided quite detailed comments, mostly relating to relatively minor issues. Based on the comments of the reviewers, my editorial decision is to invite you to prepare a revision of your manuscript.

    To revise your manuscript, log into https://mc.manuscriptcentral.com/qss and enter your Author Center, where you will find your manuscript title listed under "Manuscripts with Decisions." Under "Actions," click on "Create a Revision." Your manuscript number has been appended to denote a revision.

    You may also click the below link to start the revision process (or continue the process if you have already started your revision) for your manuscript. If you use the below link you will not be required to login to ScholarOne Manuscripts.

    PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm.

    https://mc.manuscriptcentral.com/qss?URL_MASK=1e265704d4fe4d3c848301d3c9552f45

    You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript using a word processing program and save it on your computer. Please also highlight the changes to your manuscript within the document by using the track changes mode in MS Word or by using bold or colored text.

    Once the revised manuscript is prepared, you can upload it and submit it through your Author Center.

    When submitting your revised manuscript, you will be able to respond to the comments made by the reviewers in the space provided. You can use this space to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the reviewers.

    IMPORTANT: Your original files are available to you when you upload your revised manuscript. Please delete any redundant files before completing the submission.

    If possible, please try to submit your revised manuscript by 21-Jun-2021. Let me know if you need more time to revise your work.

    Once again, thank you for submitting your manuscript to Quantitative Science Studies and I look forward to receiving your revision.

    Best wishes,
    Dr. Ludo Waltman
    Editor, Quantitative Science Studies
    qss@issi-society.org

    Reviewers' Comments to Author:

    Reviewer: 1

    Comments to the Author
    In the abstract, "other techniques" could be clarified. In particular, the "virtuous cycle" that scite's machine learning has with humans is important to mention, and novel in this space:
    "Finally, each citation statement can be flagged by individual users as incorrect, so that users can report a classification as incorrect, as well as justify their objection. After a citation statement has been flagged as incorrect, it will be reviewed and verified by two independent reviewers, and, if both agree, the recommended change will be implemented. In this way, scite supplements machine learning with human interventions to ensure citations are accurately classified."
    The work of collecting full text, parsing XML, etc. could also be mentioned here: that is a significant part of the work that cannot be swept under "machine learning"

    Page 3 - "H-index" - usually "h-index"

    Page 5 - "whether the article has been retracted" - mention current & future data sources for this information.

    Page 6 - "work on rich citations stopped in 2015" Do you know when it stopped being available on PLOS? I was not aware of this but was able to browse through to get a little more detail:
    https://web.archive.org/web/20170118172641/http://alpha.richcitations.org/
    From the API description it appears that this allowed getting a full citation graph (with an API key) https://web.archive.org/web/20170915051639/http://api.richcitations.org/ as well as getting JSON and CSV data out.
    I've attached a screenshot from https://web.archive.org/web/20160327182845/http://alpha.richcitations.org/view/10.1371/journal.pone.0094597#references - potentially useful if you write about this from a historical perspective in the future. For instance, it would be useful to know why PLOS gave up the project.

    Page 6 - "various authors have performed automated citation typing; for example, Athar and Teufel used machine learning to classify and identify “negative citations” (20–22)". This is fundamental prior work related to your project. A more substantive review would be useful here. At a minimum, clearly signal that you are citing a review article in this area (not just Athar and Teufel). In this journal, a recent paper is likely relevant to cite: Yan, E., Chen, Z., Li, K. (2020) The relationship between journal citation impact and citation sentiment: A study of 32 million citances in PubMed Central. Quantitative Science Studies. http://doi.org/10.1162/qss_a_00040

    My personal take on this is that "citation sentiment" has been used to mean not just sentiment but ANY disagreement. Yan et al's example of negative sentiment, for instance is "Another serious(-0.51) problem(-0.62) is the gene set itself that is used(0.271) for the induction of pluripotency [ ( CITE [17554338] ) ]". This seems very close to site's notion of "contrasted".

    Page 7 - It would be interesting to hear whether you've checked the CrossRef TDM, and if so why it's not used (low amounts of unpaywalled content? too much work to check compared to unpaywall? etc.) https://www.crossref.org/education/retrieve-metadata/rest-api/text-and-data-mining-for-researchers/

    Page 8 -

    A paper on GROBID could be cited in addition. Maybe this?
    Lopez P. (2009) GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications. In: Agosti M., Borbinha J., Kapidakis S., Papatheodorou C., Tsakonas G. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2009. Lecture Notes in Computer Science, vol 5714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04346-8_62
    Or something more recent by the author?

    Note 1 is very useful. Consider citing news on this (possibly a French press release or news piece).

    "the recent study of (27)" - Add an object here. (Alternately rephrase, e.g. "a recent study").

    Page 9 - Spell out S2ORC ("Semantic Scholar Open Research Corpus").

    Page 10 - "the pipeline successfully associates around 70% of citation contexts to cited papers with correctly identified DOIs in a given PDF" Very nice statistic - but how do you know this? Also, what is the proportion of your ingest literature that does not have DOIs?

    "We emphasize that scite is not doing sentiment analysis (31, 22, 32, 33), where a subjective polarity is associated with a claim" I think you are overselling here. More clear argument would be needed to convince me otherwise. The previous literature largely adopted the term "sentiment" following Athar: as an NLP task that is what this was called, and it was pithy and familiar due to the analogy with sentiment.

    Whether "sentiment analysis" in this space previously REQUIRED evidence, it often FOUND it. For instance, contrasting statements are relatively dense in what has been referred to as "negative citation sentiment"; and naturally has that kind of evidence. For that reason I think it would be better to say less about "mere negative opinion (e.g., negative sentiment) here: "We consider that capturing the reliability of a claim, a classification decision into supporting or disputing must be evidence-based, backed by scientific arguments. For instance, a mere negative opinion (e.g., negative sentiment) about a cited work not supported by any scientific and/or technical facts, studies, or replication evidence will be classified as mentioning, while it would be classified as negative by sentiment analysis." To really carry this point would require a detailed investigated into what has been done before. You haven't provided that and I don't think you intend to (or necessarily need to) to carry the main argument of this paper.

    Page 11 - It's not clear how you determined "the actual distribution in current scholarly publications" in order to make your holdout set. Any other details about the source data from which that was constructed?

    Page 12 - What makes the difference between IAA in "the open domain" vs. biomedicine?

    Gloss "doccano" (e.g. "open source annotation") - QSS readers will mostly not know what it is.

    Page 13 -
    "as compared to linear classifier" Do you mean "classifiers"?

    Consider avoiding acronyms (especially BidGRU and ELMo). Ensure that you gloss sufficiently for a more general audience (not specialist machine learning experts).

    Page 14 -
    "The F-score for the classification of "disputing" was notably improved from 20.1% to 58.97%. The precision of predicting of "disputing" in particular reaches 85.19%, a very reliable level for such a rare class." - Impressive. Not very clear how all the technology described on page 13 accomplished this. More metadiscourse/interconnection could be helpful for the reader here. (Also this phrase is quite awkward: 'The precision of predicting of "disputing"')

    Any hypotheses about why section titles didn't improve F-scores?

    Page 15 - "perspectives of improvement" - odd phrase, consider rewording.

    Page 16 - Gloss Veracity - what KIND of component is this? Figure 3 says "deep learning classifier"

    Page 17 - The "virtuous cycle" of manual improvement is important here. Consider making this a paragraph of its own and even more clearly signaled.

    Page 18 -

    Consider including an example journal or funder evaluation page as an additional figure.

    "Transitive credit" is another research concept that could be implemented on top of a directed graph.

    Page 21 - Citations without in-text contexts could be of interest for the future, especially if you take up the patent literature (where this is significant)

    Page 22 - Nice point about conceptual and logical arguments!

    FIGURES
    Make Figure 1 larger

    Figures 2 & 4- consider including the date of the screenshot and/or scite version number. Your system will likely evolve. URLs to these displays could also be included in the caption.

    BIBLIOGRAPHY
    Add DOIs where available.

    Consider using Suelzer's approach for citing Wakefield - including the retraction notice in the citation:
    Wakefield AJ, Murch SH,Anthony A, et al. Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children [retracted in: Lancet. 2010;375(9713):445]. Lancet. 1998;351(9103): 637-641. doi:10.1016/S0140-6736(97)11096-0

    Check best practices for preserving others' code repositories - these citations are more likely to bit rot than average. In some cases there is no description or documentation that can reasonably be expected to outlive the code. Providing extended titles could help. For instance delft: "DeLFT (Deep Learning Framework for Text) is a Keras and TensorFlow"

    References #16, #20, #21 are missing paper titles
    References #1, #34 are missing venue (e.g. journal) info
    Reference #31 - PhD thesis? Needs the URL. If there's a tech report number, add that.
    For the arXiv literature (particularly pre-2020) - check for formal publications.

    Reviewer: 2

    Comments to the Author
    This paper provides an overview of scite, a large scale citation index that attempts to categorize citations based on context of citation using machine learning and goes in depth on the processes involved.

    The literature review by the author seems relatively complete. However, in terms of prior work or precedents I think there should be mention of “Shepardizing” which scite itself mentions is a close analog of what scite is trying to accomplish for Science at https://medium.com/scite/scite-before-you-cite-4e22b93b8698 This is not withstanding that it is well known Eugene Garfield was influenced heavily by Sherpard’s citations

    I am not a deep learning specialist so I will not comment the details in the ML process. However it seems to me that the key point of contention around scite so far by users of scite is disagreement over the nuances around how scite defines the different citation types (particularly for the “Disputing” cite) and how likely machine learning is able to capture such nuances.

    (for example see https://twitter.com/cristiproist/status/1371978336067538947, https://twitter.com/carabonate/status/1371489736179781634)

    I would additionally note, in the paper the authors mentions scite labels “the citation type indicating intent (supporting, disputing, or mentioning)”

    However as of April 2021, I note “disputing” has been replaced by “contrasting” on Scite and I understand there was a prior label change (from “refuting”) before that as well. Though I believe these are pure label changes and not changes in the algorithm, these changes (and the reasons why) seem to be worth mentioning.

    Overall it seems to me to be helpful to expand specifically on the definitions or ontology the authors of the paper use to classify papers beyond the two paragraphs the author give it initially (and also in the limitation section) since this goes to the heart of scite.

    For example the authors cite Murray et al which provides a clear table (Table 1. Examples of our notion of disagreement) showing sample citations sentences and types of disagreement. But it isn’t clear to me the current work is following exactly that classification.

    As such, I would recommend a similar table in this paper to clarify what scite is

    a. counting as disputing/contrasting and
    b. what it isn’t counted as disputing/contrasting

    Perhaps adapt https://help.scite.ai/en-us/article/how-are-citations-classified-1a9j78t/

    For example, if Paper A states that it’s findings are different from Paper B but explains why it thinks the findings are different, would it count as a disputing/contrasting cite?

    In terms of limitations you state that “ the ontology currently employed by scite (supporting, mentioning, and disputing) necessarily misses some nuance regarding how references are cited in scientific papers. One key example relates to what “counts” as a disputing citation: at present, this category is limited to instances where new evidence is presented (e.g., a failed replication attempt or a difference in findings). However, it might also be appropriate to include conceptual and logical arguments against a given paper in this category.”

    I think another limitation to consider adding is that while “scite is not doing sentiment analysis” and that “ for capturing the reliability of a claim, a classification decision into supporting or disputing must be evidence-based, backed by scientific arguments.” , but ultimately the algorithm will not be able to assess the reliability of the evidence presented resulting in some degree of subjectivity or opinion involved in such statements.

    Lastly, is the framework or ontology of mentioning/supporting/disputing or contrasting a universal one that is consistent across different disciplinary fields with different paradigms?

    Overall, I find scite and this paper a fascinating unpredecented large scale attempt to build on traditional citations but it would be good to more carefully define the definitions for each category of citations and consider and assert the limitations of such an approach.

    Reviewer: 3

    Comments to the Author
    Really interesting piece, good solid methodology with well presented results section and presentation of limitations. Some questions however around how the ‘Smart Citations’ are then used to create rankings, I feel some more explanation would help here – see comments below.

    EDITS
    Pg3 Ln 37 ‘let alone to introducing innovations’
    Should be ‘let alone introducing innovations’
    Pg6 Ln 32 ‘low on both tools’
    Should be ‘low for both tools’
    Pg8 Ln 27 ‘converter tool for the scientific literature’

    ‘the’ is not required

    Pg14 Ln27

    The precision of prediction of “disputing” in particular reaches 85.19%

    Suggest change to: The precision for predicting “disputing” in particular reaches 85.19%

    Pg15 Ln 18 ‘..perspectives of improvement of the classifier..’

    ...improvements to the classifier...

    Pg18 Ln 3

    scites data

    needs apostrophe: scite’s data

    Pg20 Ln48 ‘in hopes that their work will be replicated.’

    ..in the hope that their work...

    COMMENTS

    Page 10, Ln 6 ‘The matching accuracy of a raw citation reaches an F-score of 95.4 on a set of 17,015 raw references associated with a DOI’

    It would be interesting to see the F-Score for accuracy when the DOI is not available – are all results based on papers which are both successfully extracted by Grobid and have a DOI?

    Pg12 Ln20 ‘An n-fold cross-evaluation on the working set for instance would have been misleading because the distribution of the classes in this set was artificially modified to boost the classification accuracy of the less frequent classes.’

    This is extremely pleasing to see. I have seen / reviewed other studies where this is exactly what has been done however the authors have not accounted for this.
    Pg18 ln 8 ‘Finally, scite provides citation indices that rank and evaluate journals and funders based on aggregate smart citation data in order to provide alternatives to the journal impact factor during research evaluation.’
    The above paragraph is probably fairly contentious to some and is a significant extra step beyond classifying citations. I would suggest that this needs an expansion with an explanation of how this rank is calculated. Are the rankings based on cumulative counts of citation types by journal for example?

    Pg19 Ln34 ‘Given that over 95% of citations made to retracted articles are in error (47) , had the Reference Check tool been applied to these papers during the review process, the mistakes could have been caught.’

    As discussed earlier in the paper, there are a proportion of instances where Grobid is not successful at parsing the references. Combined with the classification process, which cannot be 100% accurate, it would be more accurate to say the Reference Check tool would have captured ‘most’ or ‘a percentage of’ these errors.

    Pg20 Ln36 ‘In other words, scite allows its users to not only search through references provided by a given paper but also review the true scientific impact this paper has made.’

    Again I feel this is something of a leap. It could be argued that whilst supporting citations do indeed demonstrate usage of, and agreement with, the cited article this does not necessarily measure the impact of said article.

    Pg21 Ln33 ‘As such, the data provided by scite will necessarily miss a measurable percentage of citations to a given paper’

    This is of key importance and is closely tied to the accuracy and efficacy of any index based on scite classifications and needs to be discussed in more detail.

    Decision letter by
    Cite this decision letter
    Reviewer report
    2021/04/22

    Really interesting piece, good solid methodology with well presented results section and presentation of limitations. Some questions however around how the ‘Smart Citations’ are then used to create rankings, I feel some more explanation would help here – see comments below.

    EDITS
    Pg3 Ln 37 ‘let alone to introducing innovations’
    Should be ‘let alone introducing innovations’
    Pg6 Ln 32 ‘low on both tools’
    Should be ‘low for both tools’
    Pg8 Ln 27 ‘converter tool for the scientific literature’

    ‘the’ is not required

    Pg14 Ln27

    The precision of prediction of “disputing” in particular reaches 85.19%

    Suggest change to: The precision for predicting “disputing” in particular reaches 85.19%

    Pg15 Ln 18 ‘..perspectives of improvement of the classifier..’

    ...improvements to the classifier...

    Pg18 Ln 3

    scites data

    needs apostrophe: scite’s data

    Pg20 Ln48 ‘in hopes that their work will be replicated.’

    ..in the hope that their work...

    COMMENTS

    Page 10, Ln 6 ‘The matching accuracy of a raw citation reaches an F-score of 95.4 on a set of 17,015 raw references associated with a DOI’

    It would be interesting to see the F-Score for accuracy when the DOI is not available – are all results based on papers which are both successfully extracted by Grobid and have a DOI?

    Pg12 Ln20 ‘An n-fold cross-evaluation on the working set for instance would have been misleading because the distribution of the classes in this set was artificially modified to boost the classification accuracy of the less frequent classes.’

    This is extremely pleasing to see. I have seen / reviewed other studies where this is exactly what has been done however the authors have not accounted for this.
    Pg18 ln 8 ‘Finally, scite provides citation indices that rank and evaluate journals and funders based on aggregate smart citation data in order to provide alternatives to the journal impact factor during research evaluation.’
    The above paragraph is probably fairly contentious to some and is a significant extra step beyond classifying citations. I would suggest that this needs an expansion with an explanation of how this rank is calculated. Are the rankings based on cumulative counts of citation types by journal for example?

    Pg19 Ln34 ‘Given that over 95% of citations made to retracted articles are in error (47) , had the Reference Check tool been applied to these papers during the review process, the mistakes could have been caught.’

    As discussed earlier in the paper, there are a proportion of instances where Grobid is not successful at parsing the references. Combined with the classification process, which cannot be 100% accurate, it would be more accurate to say the Reference Check tool would have captured ‘most’ or ‘a percentage of’ these errors.

    Pg20 Ln36 ‘In other words, scite allows its users to not only search through references provided by a given paper but also review the true scientific impact this paper has made.’

    Again I feel this is something of a leap. It could be argued that whilst supporting citations do indeed demonstrate usage of, and agreement with, the cited article this does not necessarily measure the impact of said article.

    Pg21 Ln33 ‘As such, the data provided by scite will necessarily miss a measurable percentage of citations to a given paper’

    This is of key importance and is closely tied to the accuracy and efficacy of any index based on scite classifications and needs to be discussed in more detail.

    Reviewed by
    Cite this review
    Reviewer report
    2021/04/16

    This paper provides an overview of scite, a large scale citation index that attempts to categorize citations based on context of citation using machine learning and goes in depth on the processes involved.

    The literature review by the author seems relatively complete. However, in terms of prior work or precedents I think there should be mention of “Shepardizing” which scite itself mentions is a close analog of what scite is trying to accomplish for Science at https://medium.com/scite/scite-before-you-cite-4e22b93b8698 This is not withstanding that it is well known Eugene Garfield was influenced heavily by Sherpard’s citations

    I am not a deep learning specialist so I will not comment the details in the ML process. However it seems to me that the key point of contention around scite so far by users of scite is disagreement over the nuances around how scite defines the different citation types (particularly for the “Disputing” cite) and how likely machine learning is able to capture such nuances.

    (for example see https://twitter.com/cristiproist/status/1371978336067538947, https://twitter.com/carabonate/status/1371489736179781634)

    I would additionally note, in the paper the authors mentions scite labels “the citation type indicating intent (supporting, disputing, or mentioning)”

    However as of April 2021, I note “disputing” has been replaced by “contrasting” on Scite and I understand there was a prior label change (from “refuting”) before that as well. Though I believe these are pure label changes and not changes in the algorithm, these changes (and the reasons why) seem to be worth mentioning.

    Overall it seems to me to be helpful to expand specifically on the definitions or ontology the authors of the paper use to classify papers beyond the two paragraphs the author give it initially (and also in the limitation section) since this goes to the heart of scite.

    For example the authors cite Murray et al which provides a clear table (Table 1. Examples of our notion of disagreement) showing sample citations sentences and types of disagreement. But it isn’t clear to me the current work is following exactly that classification.

    As such, I would recommend a similar table in this paper to clarify what scite is

    a. counting as disputing/contrasting and
    b. what it isn’t counted as disputing/contrasting

    Perhaps adapt https://help.scite.ai/en-us/article/how-are-citations-classified-1a9j78t/

    For example, if Paper A states that it’s findings are different from Paper B but explains why it thinks the findings are different, would it count as a disputing/contrasting cite?

    In terms of limitations you state that “ the ontology currently employed by scite (supporting, mentioning, and disputing) necessarily misses some nuance regarding how references are cited in scientific papers. One key example relates to what “counts” as a disputing citation: at present, this category is limited to instances where new evidence is presented (e.g., a failed replication attempt or a difference in findings). However, it might also be appropriate to include conceptual and logical arguments against a given paper in this category.”

    I think another limitation to consider adding is that while “scite is not doing sentiment analysis” and that “ for capturing the reliability of a claim, a classification decision into supporting or disputing must be evidence-based, backed by scientific arguments.” , but ultimately the algorithm will not be able to assess the reliability of the evidence presented resulting in some degree of subjectivity or opinion involved in such statements.

    Lastly, is the framework or ontology of mentioning/supporting/disputing or contrasting a universal one that is consistent across different disciplinary fields with different paradigms?

    Overall, I find scite and this paper a fascinating unpredecented large scale attempt to build on traditional citations but it would be good to more carefully define the definitions for each category of citations and consider and assert the limitations of such an approach.

    Reviewed by
    Cite this review
    Reviewer report
    2021/04/12

    In the abstract, "other techniques" could be clarified. In particular, the "virtuous cycle" that scite's machine learning has with humans is important to mention, and novel in this space:
    "Finally, each citation statement can be flagged by individual users as incorrect, so that users can report a classification as incorrect, as well as justify their objection. After a citation statement has been flagged as incorrect, it will be reviewed and verified by two independent reviewers, and, if both agree, the recommended change will be implemented. In this way, scite supplements machine learning with human interventions to ensure citations are accurately classified."
    The work of collecting full text, parsing XML, etc. could also be mentioned here: that is a significant part of the work that cannot be swept under "machine learning"

    Page 3 - "H-index" - usually "h-index"

    Page 5 - "whether the article has been retracted" - mention current & future data sources for this information.

    Page 6 - "work on rich citations stopped in 2015" Do you know when it stopped being available on PLOS? I was not aware of this but was able to browse through to get a little more detail:
    https://web.archive.org/web/20170118172641/http://alpha.richcitations.org/
    From the API description it appears that this allowed getting a full citation graph (with an API key) https://web.archive.org/web/20170915051639/http://api.richcitations.org/ as well as getting JSON and CSV data out.
    I've attached a screenshot from https://web.archive.org/web/20160327182845/http://alpha.richcitations.org/view/10.1371/journal.pone.0094597#references - potentially useful if you write about this from a historical perspective in the future. For instance, it would be useful to know why PLOS gave up the project.

    Page 6 - "various authors have performed automated citation typing; for example, Athar and Teufel used machine learning to classify and identify “negative citations” (20–22)". This is fundamental prior work related to your project. A more substantive review would be useful here. At a minimum, clearly signal that you are citing a review article in this area (not just Athar and Teufel). In this journal, a recent paper is likely relevant to cite: Yan, E., Chen, Z., Li, K. (2020) The relationship between journal citation impact and citation sentiment: A study of 32 million citances in PubMed Central. Quantitative Science Studies. http://doi.org/10.1162/qss_a_00040

    My personal take on this is that "citation sentiment" has been used to mean not just sentiment but ANY disagreement. Yan et al's example of negative sentiment, for instance is "Another serious(-0.51) problem(-0.62) is the gene set itself that is used(0.271) for the induction of pluripotency [ ( CITE [17554338] ) ]". This seems very close to site's notion of "contrasted".

    Page 7 - It would be interesting to hear whether you've checked the CrossRef TDM, and if so why it's not used (low amounts of unpaywalled content? too much work to check compared to unpaywall? etc.) https://www.crossref.org/education/retrieve-metadata/rest-api/text-and-data-mining-for-researchers/

    Page 8 -

    A paper on GROBID could be cited in addition. Maybe this?
    Lopez P. (2009) GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications. In: Agosti M., Borbinha J., Kapidakis S., Papatheodorou C., Tsakonas G. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2009. Lecture Notes in Computer Science, vol 5714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04346-8_62
    Or something more recent by the author?

    Note 1 is very useful. Consider citing news on this (possibly a French press release or news piece).

    "the recent study of (27)" - Add an object here. (Alternately rephrase, e.g. "a recent study").

    Page 9 - Spell out S2ORC ("Semantic Scholar Open Research Corpus").

    Page 10 - "the pipeline successfully associates around 70% of citation contexts to cited papers with correctly identified DOIs in a given PDF" Very nice statistic - but how do you know this? Also, what is the proportion of your ingest literature that does not have DOIs?

    "We emphasize that scite is not doing sentiment analysis (31, 22, 32, 33), where a subjective polarity is associated with a claim" I think you are overselling here. More clear argument would be needed to convince me otherwise. The previous literature largely adopted the term "sentiment" following Athar: as an NLP task that is what this was called, and it was pithy and familiar due to the analogy with sentiment.

    Whether "sentiment analysis" in this space previously REQUIRED evidence, it often FOUND it. For instance, contrasting statements are relatively dense in what has been referred to as "negative citation sentiment"; and naturally has that kind of evidence. For that reason I think it would be better to say less about "mere negative opinion (e.g., negative sentiment) here: "We consider that capturing the reliability of a claim, a classification decision into supporting or disputing must be evidence-based, backed by scientific arguments. For instance, a mere negative opinion (e.g., negative sentiment) about a cited work not supported by any scientific and/or technical facts, studies, or replication evidence will be classified as mentioning, while it would be classified as negative by sentiment analysis." To really carry this point would require a detailed investigated into what has been done before. You haven't provided that and I don't think you intend to (or necessarily need to) to carry the main argument of this paper.

    Page 11 - It's not clear how you determined "the actual distribution in current scholarly publications" in order to make your holdout set. Any other details about the source data from which that was constructed?

    Page 12 - What makes the difference between IAA in "the open domain" vs. biomedicine?

    Gloss "doccano" (e.g. "open source annotation") - QSS readers will mostly not know what it is.

    Page 13 -
    "as compared to linear classifier" Do you mean "classifiers"?

    Consider avoiding acronyms (especially BidGRU and ELMo). Ensure that you gloss sufficiently for a more general audience (not specialist machine learning experts).

    Page 14 -
    "The F-score for the classification of "disputing" was notably improved from 20.1% to 58.97%. The precision of predicting of "disputing" in particular reaches 85.19%, a very reliable level for such a rare class." - Impressive. Not very clear how all the technology described on page 13 accomplished this. More metadiscourse/interconnection could be helpful for the reader here. (Also this phrase is quite awkward: 'The precision of predicting of "disputing"')

    Any hypotheses about why section titles didn't improve F-scores?

    Page 15 - "perspectives of improvement" - odd phrase, consider rewording.

    Page 16 - Gloss Veracity - what KIND of component is this? Figure 3 says "deep learning classifier"

    Page 17 - The "virtuous cycle" of manual improvement is important here. Consider making this a paragraph of its own and even more clearly signaled.

    Page 18 -

    Consider including an example journal or funder evaluation page as an additional figure.

    "Transitive credit" is another research concept that could be implemented on top of a directed graph.

    Page 21 - Citations without in-text contexts could be of interest for the future, especially if you take up the patent literature (where this is significant)

    Page 22 - Nice point about conceptual and logical arguments!

    FIGURES
    Make Figure 1 larger

    Figures 2 & 4- consider including the date of the screenshot and/or scite version number. Your system will likely evolve. URLs to these displays could also be included in the caption.

    BIBLIOGRAPHY
    Add DOIs where available.

    Consider using Suelzer's approach for citing Wakefield - including the retraction notice in the citation:
    Wakefield AJ, Murch SH,Anthony A, et al. Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children [retracted in: Lancet. 2010;375(9713):445]. Lancet. 1998;351(9103): 637-641. doi:10.1016/S0140-6736(97)11096-0

    Check best practices for preserving others' code repositories - these citations are more likely to bit rot than average. In some cases there is no description or documentation that can reasonably be expected to outlive the code. Providing extended titles could help. For instance delft: "DeLFT (Deep Learning Framework for Text) is a Keras and TensorFlow"

    References #16, #20, #21 are missing paper titles
    References #1, #34 are missing venue (e.g. journal) info
    Reference #31 - PhD thesis? Needs the URL. If there's a tech report number, add that.
    For the arXiv literature (particularly pre-2020) - check for formal publications.

    Reviewed by
    Cite this review
All peer review content displayed here is covered by a Creative Commons CC BY 4.0 license.