Abstract

Purpose - The main purpose of this study is to explore and validate the question "whether altmetric mentions can predict citations to scholarly articles". The paper attempts to explore the nature and degree of correlation between altmetrics (from ResearchGate and three social media platforms) and citations.Design/methodology/approach - A large size data sample of scholarly articles published from India for the year 2016 is obtained from the Web of Science database and the corresponding altmetric data are obtained from ResearchGate and three social media platforms (Twitter, Facebook and blog through Altmetric.com aggregator). Correlations are computed between early altmetric mentions and later citation counts, for data grouped in different disciplinary groups.Findings - Results show that the correlation between altmetric mentions and citation counts are positive, but weak. Correlations are relatively higher in the case of data from ResearchGate as compared to the data from the three social media platforms. Further, significant disciplinary differences are observed in the degree of correlations between altmetrics and citations.Research limitations/implications - The results support the idea that altmetrics do not necessarily reflect the same kind of impact as citations. However, articles that get higher altmetric attention early may actually have a slight citation advantage. Further, altmetrics from academic social networks like ResearchGate are more correlated with citations, as compared to social media platforms.Originality/value - The paper has novelty in two respects. First, it takes altmetric data for a window of about 1-1.5 years after the article publication and citation counts for a longer citation window of about 3-4 years after the publication of article. Second, it is one of the first studies to analyze data from the ResearchGate platform, a popular academic social network, to understand the type and degree of correlations.


Authors

Banshal, Sumit Kumar;  Singh, Vivek Kumar;  Muhuri, Pranab Kumar

Publons users who've claimed - I am an author

No Publons users have claimed this paper.

Contributors on Publons
  • 2 reviewers
  • pre-publication peer review (FINAL ROUND)
    Decision Letter
    2020/12/15

    15-Dec-2020

    Dear Banshal, Sumit; Singh, Vivek; Muhuri, Pranab

    It is a pleasure to accept your manuscript OIR-11-2019-0364.R5, entitled "Can altmetric mentions predict later citations? A test of validity on data from ResearchGate and three social media platforms" in its current form for publication in Online Information Review. Please note, no further changes can be made to your manuscript.

    Please go to your Author Centre at https://mc.manuscriptcentral.com/oir (Manuscripts with Decisions for the submitting author or Manuscripts I have co-authored for all listed co-authors) to complete the Copyright Transfer Agreement form (CTA). We cannot publish your paper without this.

    All authors are requested to complete the form and to input their full contact details. If any of the contact information is incorrect you can update it by clicking on your name at the top right of the screen. Please note that this must be done prior to you submitting your CTA.

    If you have an ORCID please check your account details to ensure that your ORCID is validated.

    By publishing in this journal your work will benefit from Emerald EarlyCite. As soon as your CTA is completed your manuscript will pass to Emerald’s Content Management department and be processed for EarlyCite publication. EarlyCite is the author proofed, typeset version of record, fully citable by DOI. The EarlyCite article sits outside of a journal issue and is paginated in isolation. The EarlyCite article will be collated into a journal issue according to the journals’ publication schedule.

    FOR OPEN ACCESS AUTHORS: Please note if you have indicated that you would like to publish your article as Open Access via Emerald’s Gold Open Access route, you are required to complete a Creative Commons Attribution Licence - CCBY 4.0 (in place of the standard copyright assignment form referenced above). You will receive a follow up email within the next 30 days with a link to the CCBY licence and information regarding payment of the Article Processing Charge. If you have indicated that you might be eligible for a prepaid APC voucher, you will also be informed at this point if a voucher is available to you (for more information on APC vouchers please see http://www.emeraldpublishing.com/oapartnerships

    Thank you for your contribution. On behalf of the Editors of Online Information Review, we look forward to your continued contributions to the Journal.

    Sincerely,

    Dr. Eugenia Siapera
    Co-Editor
    eugenia.siapera@ucd.ie


    Tell us how we're doing! We’d love to hear your feedback on the submission and review process to help us to continue to support your needs on the publishing journey.

    Simply click this link https://eu.surveymonkey.com/r/F8GZ2XW to complete a short survey and as a thank you for taking part you have the option to be entered into a prize draw to win £100 in Amazon vouchers. To enter the prize draw you will need to provide your email address.

    Decision letter by
    Cite this decision letter
    Reviewer report
    2020/12/07

    My comments have been appropriately addressed.

    Reviewed by
    Cite this review
    Author Response
    2020/11/16

    We thank the learned reviewers for valuable inputs and suggestions. The required minor modification is incorporated in the revised manuscript.



    Cite this author response
  • pre-publication peer review (ROUND 5)
    Decision Letter
    2020/11/15

    15-Nov-2020

    Dear Prof. Singh,

    Manuscript ID OIR-11-2019-0364.R4 entitled "Can altmetric mentions predict later citations? A test of validity on data from ResearchGate and three social media platforms" which you submitted to Online Information Review, has been reviewed. The comments of the reviewer(s) are included at the bottom of this letter.

    The reviewer(s) have recommended publication, but also suggest some minor revisions to your manuscript. Therefore, I invite you to respond to the reviewer(s)' comments and revise your manuscript. Please also ensure that in doing so your paper does not exceed the maximum word length of 10000 words and that it meets all the requirements of the author guidelines at http://www.emeraldinsight.com/products/journals/author_guidelines.htm?id=oir&PHPSESSID;=ubl727mru90lg3hc8sa5p5qrt2."

    To revise your manuscript, log into https://mc.manuscriptcentral.com/oir and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions." Under "Actions," click on "Create a Revision." Your manuscript number has been appended to denote a revision.

    You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript using a word processing program and save it on your computer. Please also highlight the changes to your manuscript within the document by using the track changes mode in MS Word or by using bold or coloured text.

    Once the revised manuscript is prepared, you can upload it and submit it through your Author Centre.

    When submitting your revised manuscript, you will be able to respond to the comments made by the reviewer(s) in the space provided. You can use this space to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the reviewer(s).

    IMPORTANT: Your original files are available to you when you upload your revised manuscript. Please delete any redundant files before completing the submission.

    Because we are trying to facilitate timely publication of manuscripts submitted to Online Information Review, your revised manuscript should be uploaded as soon as possible. If it is not possible for you to submit your revision in a reasonable amount of time, we may have to consider your paper as a new submission.

    Once again, thank you for submitting your manuscript to Online Information Review and I look forward to receiving your revision.

    Yours sincerely,

    To help support you on your publishing journey we have partnered with Editage, a leading global science communication platform, to offer expert editorial support including language editing and translation.
    If your article has been rejected or revisions have been requested, you may benefit from Editage’s services. For a full list of services, visit: authorservices.emeraldpublishing.com/
    Please note that there is no obligation to use Editage and using this service does not guarantee publication.

    Dr. Eugenia Siapera
    Co-Editor
    eugenia.siapera@ucd.ie

    Reviewer(s)' Comments to Author:
    Reviewer: 1

    Recommendation: Minor Revision

    Comments:
    Page 12 still contains instances in which it is unclear which Mendeley data the authors are referring to:
    "The degree of correlation between altmetrics and citations in case of the ResearchGate platform is higher than traditional social media platforms but lower than Mendeley, another popular academic social network.",
    "Our results show that ResearchGate data shows a positive correlation between altmetrics and citations, with the degree of correlation being higher than traditional social networks but lower than Mendeley platform.", and
    "Therefore, I it gets a degree of correlation value somewhere in between that of social media platforms (which primarily have social media features) and Mendeley (which is a popular bibliographic tool/ platform)."

    Maybe it is best to remove the sentences. Alternatively, it has to be made clear which Mendeley analysis is referred to.

    Additional Questions:
    Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: yes

    Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: yes

    Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: yes

    Results: For empirical papers - are results presented clearly and analysed appropriately?: yes

    Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: Partly, see comments to the authors.

    Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: yes

    Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: yes

    Reproducible Research: If appropriate, is sufficient information, potentially including data and software, provided to reproduce the results and are the corresponding datasets formally cited?: In principal the work should be reproducible. However, data from altmetrics and bibliometrics are changing constantly. The aim of this study was to compare older altmetrics data with newer bibliometrics data. These older altmetrics data are no longer available. Therefore, the exact same data are not reproducible. It would help to make the method of the study easier reproducible if the routines were shared that were used to gather the ResearchGate data.

    Reviewer: 2

    Recommendation: Accept

    Comments:
    Thank you for making these changes.

    Additional Questions:
    Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: Yes

    Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: Fine

    Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: OK

    Results: For empirical papers - are results presented clearly and analysed appropriately?: Yes

    Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: OK

    Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: Fine

    Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: Fine

    Decision letter by
    Cite this decision letter
    Reviewer report
    2020/11/13

    Thank you for making these changes.

    Reviewed by
    Cite this review
    Reviewer report
    2020/11/04

    Page 12 still contains instances in which it is unclear which Mendeley data the authors are referring to:
    "The degree of correlation between altmetrics and citations in case of the ResearchGate platform is higher than traditional social media platforms but lower than Mendeley, another popular academic social network.",
    "Our results show that ResearchGate data shows a positive correlation between altmetrics and citations, with the degree of correlation being higher than traditional social networks but lower than Mendeley platform.", and
    "Therefore, I it gets a degree of correlation value somewhere in between that of social media platforms (which primarily have social media features) and Mendeley (which is a popular bibliographic tool/ platform)."

    Maybe it is best to remove the sentences. Alternatively, it has to be made clear which Mendeley analysis is referred to.

    Reviewed by
    Cite this review
    Author Response
    2020/10/28

    We are thankful to the learned reviewers for the valuable inputs and suggestions that helped us in improving the manuscript significantly over multiple rounds.
    We have made the minor changes (regarding adding references) suggested by the learned reviewer and are very much hopefull that the manuscript is now ready for final acceptance and publication.



    Cite this author response
  • pre-publication peer review (ROUND 4)
    Decision Letter
    2020/10/20

    20-Oct-2020

    Dear Prof. Singh,

    Manuscript ID OIR-11-2019-0364.R3 entitled "Can altmetric mentions predict later citations? A test of validity on data from ResearchGate and three social media platforms" which you submitted to Online Information Review, has been reviewed. The comments of the reviewer(s) are included at the bottom of this letter.

    The reviewer(s) have recommended publication, but note a few very minor issues with the revised manuscript. Please address these and resubmit your manuscript as per the instructions.

    To revise your manuscript, log into https://mc.manuscriptcentral.com/oir and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions." Under "Actions," click on "Create a Revision." Your manuscript number has been appended to denote a revision.

    You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript using a word processing program and save it on your computer. Please also highlight the changes to your manuscript within the document by using the track changes mode in MS Word or by using bold or coloured text.

    Once the revised manuscript is prepared, you can upload it and submit it through your Author Centre.

    When submitting your revised manuscript, you will be able to respond to the comments made by the reviewer(s) in the space provided. You can use this space to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the reviewer(s).

    IMPORTANT: Your original files are available to you when you upload your revised manuscript. Please delete any redundant files before completing the submission.

    Because we are trying to facilitate timely publication of manuscripts submitted to Online Information Review, your revised manuscript should be uploaded as soon as possible. If it is not possible for you to submit your revision in a reasonable amount of time, we may have to consider your paper as a new submission.

    Once again, thank you for submitting your manuscript to Online Information Review and I look forward to receiving your revision.

    Yours sincerely,

    To help support you on your publishing journey we have partnered with Editage, a leading global science communication platform, to offer expert editorial support including language editing and translation.
    If your article has been rejected or revisions have been requested, you may benefit from Editage’s services. For a full list of services, visit: authorservices.emeraldpublishing.com/
    Please note that there is no obligation to use Editage and using this service does not guarantee publication.

    Dr. Eugenia Siapera
    Co-Editor
    eugenia.siapera@ucd.ie

    Reviewer(s)' Comments to Author:
    Reviewer: 1

    Recommendation: Accept

    Comments:
    Thank you for making these changes. I would be happy to see this published.

    Additional Questions:
    Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: Yes

    Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: Yes

    Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: Yes

    Results: For empirical papers - are results presented clearly and analysed appropriately?: Yes

    Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: Yes

    Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: Yes

    Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: Yes

    Reviewer: 2

    Recommendation: Minor Revision

    Comments:
    The manuscript improved significantly by the revisions made. However, I advise to read carefully the sentences in the results and discussion where reference to Mendeley data is being made. Is it clear to which source is referred to? For example the reference is clear on page 9, lines 43-46, but it is unclear on page 10, lines 1-4, page 12, lines 9-11 and lines 21-26, and page 13, lines 14-18 and 23-27. These parts should be revised carefully so that it is clear which source is meant for Mendeley data.

    Additional Questions:
    Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: Yes

    Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: Yes

    Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: Yes

    Results: For empirical papers - are results presented clearly and analysed appropriately?: Yes

    Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: Yes

    Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: Yes

    Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: Yes

    Decision letter by
    Cite this decision letter
    Reviewer report
    2020/10/20

    The manuscript improved significantly by the revisions made. However, I advise to read carefully the sentences in the results and discussion where reference to Mendeley data is being made. Is it clear to which source is referred to? For example the reference is clear on page 9, lines 43-46, but it is unclear on page 10, lines 1-4, page 12, lines 9-11 and lines 21-26, and page 13, lines 14-18 and 23-27. These parts should be revised carefully so that it is clear which source is meant for Mendeley data.

    Reviewed by
    Cite this review
    Reviewer report
    2020/09/23

    Thank you for making these changes. I would be happy to see this published.

    Reviewed by
    Cite this review
    Author Response
    2020/09/19

    Response to Review Comments

    We are very thankful to the learned reviewers for the comments and suggestions to improve the manuscript. We have revised the manuscript further to address the comments. A summary of detailed response to review comments is as below.

    Response to Reviewer 1:

    Comment: Thank you for making these changes. I think that the paper is ready to be published now.
    Response: We thank the learned reviewer for valuable suggestions and comments in earlier reviews, which helped in improving the manuscript significantly.

    Response to Reviewer 2:

    Comment 1: The manuscript has improved in the previous round of revision. However, there are some remaining things to fix.
    Response: We thank the learned reviewer for valuable suggestions and comments. We gave tried to address all the comments.

    Comment 2: Regarding the neglect of papers without altmetrics activity, I think just because mistakes were made, they should not be made over and over again. Others, e.g., Wang, Glaenzel, and Chen (2020), do it right: "... for papers without Mendeley readership or Twitter posts, readers and the tweets counts were put zero." (p. 6) I could find and present many more examples that employ this common practice. Mainly, the reason for no altmetrics activity is no activity rather than lack of coverage. For example, Altmetric.com screens paper mentions on Twitter. They might miss some papers or papers mentioned without DOI, link or PubMedID. This should not occur that often. Therefore, lack of coverage is very unlikely. If papers not found at Altmetric.com are not included in the analysis, a bias towards mentioned papers in the analysis has to be expected. Such a bias should be discussed in the limitations of the study.
    Response: We thank the learned reviewer for the point. This is now clearly indicated in the Limitations section.

    Comment 3: I am confused about the methodology. On page 7 lines 30-36, the authors write about which publications were found and which publications were not. However, the explanation is confusing. Please rewrite clearer what has been done.
    Response: The methodology has been significantly revised earlier and is checked again for clarity of expression.

    Comment 4: On page 8, lines 3-10, the authors provide rather rough guidelines for interpreting correlation coefficients without proper reasoning. Cohen (1988) and Kraemer et al. (2003) provided well-reasoned rules for interpretation of correlation coefficients.
    Response: References are added to the said papers indicating the reasons for selection of the specific method.

    Comment 5: Considering the high relevance of ResearchGate regarding the (revised) manuscript, the problems associated with the ResearchGate score should be discussed carefully, see for example Meier and Tunger (2018a) and Meier and Tunger (2018b).
    Response: This is now included in the Results and the Discussion sections.

    Comment 6: In general, the introduction is rather short. Large data sets were analyzed by, e.g., Bornmann and Haunschild (2018), Bornmann, Haunschild, and Adams (2019), and Kassab, Bornmann, and Haunschild (2020) regarding the correlation between quality of papers and altmetrics as well as citations and altmetrics. Such previous studies should be discussed in the context of the present study.
    Response: Reference is made to the useful point stated in Introduction as well as Related Work sections.

    Comment 7: It is confusing to refer to Mendeley readers or bookmarks as mentions (as on page 9, lines 9-11).
    Response: This stand corrected to “reads”.

    Comment 8: Page 9, lines 48-50: "The SRCC values for Facebook mentions and citations of the articles in different disciplinary groups is shown in Table 4." --> "The SRCC values for ... are shown ... ."
    Response: The error is regretted and correction is done.

    Comment 9: In some instances, the authors mention that the used altmetrics data are from 1 to 1 1/2 years after publication, e.g., page 11, lines 46/47. The "1/2" is written in superscript and seems as if it should be an exponent notation which does not make sense for 1 1/2. Writing the "1/2" as usual script or the 1 1/2 as 1.5 would be much better.
    Response: All such references are now replaced throughout with 1.5.

    Comment 10: Page 10, lines 46-49: "This was done mainly to see if altmetrics correlate with later citations, and hence whether altmetrics can be used to predict later citations." Simple correlation results are not sufficient as a test "whether altmetrics can be used to predict later citations." This is repeated similarly on page 12, lines 47-49. One should be careful when drawing conclusions from correlations.
    Response: A caution to this fact is clearly indicated in the Limitations section.

    Comment 11: Page 11, lines 7-9: "ResearchGate has a good mix of social media as well as bibliographical features, which makes this platform quite comprehensive academic social network (Yu et al., 2016)." What is the source of the bibliographic data that ResearchGate uses? This would be interesting to know. If it is unknown, this should be stated.
    Response: This is now clearly stated in the text.

    Comment 12: Missing whitespace on page 11, lines 17/18: "... ResearchGate platform ishigher ..."
    Response: The typo stands corrected.

    Comment 13: On page 11, lines 25-32, the authors try to explain the lower correlation between ResearchGate data and Mendeley data with the dual nature of ResearchGate as a "platform with a mix of social media and bibliographic features". However, this is also true for Mendeley. One might argue that the social media component is stronger on ResearchGate than on Mendeley. Especially, for the Mendeley reader counts that are less affected by the social component than the ResearchGate score. The ResearchGate score is strongly influenced by the social media component as Meier and Tunger (2018a) have shown.
    Response: Thanks for the useful references. The Results and Discussion sections now include appropriate discussion.

    Comment 14: Typo on page 11, line 40/41: "... twitter platform ..."
    Response: The typo is regretted and stands corrected.

    Comment 15: In the discussion of the Facebook results, Enkhbayar et al. (2020) should be taken into account.
    Response: Thanks for the reference. This is now added in the discussion of Facebook results.



    Cite this author response
  • pre-publication peer review (ROUND 3)
    Decision Letter
    2020/08/27

    27-Aug-2020

    Dear Prof. Singh,

    Manuscript ID OIR-11-2019-0364.R2 entitled "Can altmetric mentions predict later citations? A test of validity on data from ResearchGate and three social media platforms" which you submitted to Online Information Review, has been reviewed. The comments of the reviewer(s) are included at the bottom of this letter.

    The reviewer(s) have recommended publication, but also suggest some minor revisions to your manuscript. Therefore, I invite you to respond to the reviewer(s)' comments and revise your manuscript. Please also ensure that in doing so your paper does not exceed the maximum word length of 10000 words and that it meets all the requirements of the author guidelines at http://www.emeraldinsight.com/products/journals/author_guidelines.htm?id=oir&PHPSESSID;=ubl727mru90lg3hc8sa5p5qrt2."

    To revise your manuscript, log into https://mc.manuscriptcentral.com/oir and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions." Under "Actions," click on "Create a Revision." Your manuscript number has been appended to denote a revision.

    You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript using a word processing program and save it on your computer. Please also highlight the changes to your manuscript within the document by using the track changes mode in MS Word or by using bold or coloured text.

    Once the revised manuscript is prepared, you can upload it and submit it through your Author Centre.

    When submitting your revised manuscript, you will be able to respond to the comments made by the reviewer(s) in the space provided. You can use this space to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the reviewer(s).

    IMPORTANT: Your original files are available to you when you upload your revised manuscript. Please delete any redundant files before completing the submission.

    Because we are trying to facilitate timely publication of manuscripts submitted to Online Information Review, your revised manuscript should be uploaded as soon as possible. If it is not possible for you to submit your revision in a reasonable amount of time, we may have to consider your paper as a new submission.

    Once again, thank you for submitting your manuscript to Online Information Review and I look forward to receiving your revision.

    Yours sincerely,

    To help support you on your publishing journey we have partnered with Editage, a leading global science communication platform, to offer expert editorial support including language editing and translation.
    If your article has been rejected or revisions have been requested, you may benefit from Editage’s services. For a full list of services, visit: authorservices.emeraldpublishing.com/
    Please note that there is no obligation to use Editage and using this service does not guarantee publication.

    Dr. Eugenia Siapera
    Co-Editor
    eugenia.siapera@ucd.ie

    Reviewer(s)' Comments to Author:
    Reviewer: 1

    Recommendation: Accept

    Comments:
    Thank you for making these changes. I think that the paper is ready to be published now.

    Additional Questions:
    Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: Yes

    Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: Yes

    Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: Yes

    Results: For empirical papers - are results presented clearly and analysed appropriately?: Yes

    Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: Yes

    Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: Yes

    Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: Yes

    Reviewer: 2

    Recommendation: Major Revision

    Comments:
    The manuscript has improved in the previous round of revision. However, there are some remaining things to fix.

    Regarding the neglect of papers without altmetrics activity, I think just because mistakes were made, they should not be made over and over again. Others, e.g., Wang, Glaenzel, and Chen (2020), do it right: "... for papers without Mendeley readership or Twitter posts, readers and the tweets counts were put zero." (p. 6) I could find and present many more examples that employ this common practice. Mainly, the reason for no altmetrics activity is no activity rather than lack of coverage. For example, Altmetric.com screens paper mentions on Twitter. They might miss some papers or papers mentioned without DOI, link or PubMedID. This should not occur that often. Therefore, lack of coverage is very unlikely. If papers not found at Altmetric.com are not included in the analysis, a bias towards mentioned papers in the analysis has to be expected. Such a bias should be discussed in the limitations of the study.

    I am confused about the methodology. On page 7 lines 30-36, the authors write about which publications were found and which publications were not. However, the explanation is confusing. Please rewrite clearer what has been done.

    On page 8, lines 3-10, the authors provide rather rough guidelines for interpreting correlation coefficients without proper reasoning. Cohen (1988) and Kraemer et al. (2003) provided well-reasoned rules for interpretation of correlation coefficients.

    Considering the high relevance of ResearchGate regarding the (revised) manuscript, the problems associated with the ResearchGate score should be discussed carefully, see for example Meier and Tunger (2018a) and Meier and Tunger (2018b).

    In general, the introduction is rather short. Large data sets were analyzed by, e.g., Bornmann and Haunschild (2018), Bornmann, Haunschild, and Adams (2019), and Kassab, Bornmann, and Haunschild (2020) regarding the correlation between quality of papers and altmetrics as well as citations and altmetrics. Such previous studies should be discussed in the context of the present study.

    It is confusing to refer to Mendeley readers or bookmarks as mentions (as on page 9, lines 9-11).

    Page 9, lines 48-50: "The SRCC values for Facebook mentions and citations of the articles in different disciplinary groups is shown in Table 4." --> "The SRCC values for ... are shown ... ."

    In some instances, the authors mention that the used altmetrics data are from 1 to 1 1/2 years after publication, e.g., page 11, lines 46/47. The "1/2" is written in superscript and seems as if it should be an exponent notation which does not make sense for 1 1/2. Writing the "1/2" as usual script or the 1 1/2 as 1.5 would be much better.

    Page 10, lines 46-49: "This was done mainly to see if altmetrics correlate with later citations, and hence whether altmetrics can be used to predict later citations." Simple correlation results are not sufficient as a test "whether altmetrics can be used to predict later citations." This is repeated similarly on page 12, lines 47-49. One should be careful when drawing conclusions from correlations.

    Page 11, lines 7-9: "ResearchGate has a good mix of social media as well as bibliographical features, which makes this platform quite comprehensive academic social network (Yu et al., 2016)." What is the source of the bibliographic data that ResearchGate uses? This would be interesting to know. If it is unknown, this should be stated.

    Missing whitespace on page 11, lines 17/18: "... ResearchGate platform ishigher ..."

    On page 11, lines 25-32, the authors try to explain the lower correlation between ResearchGate data and Mendeley data with the dual nature of ResearchGate as a "platform with a mix of social media and bibliographic features". However, this is also true for Mendeley. One might argue that the social media component is stronger on ResearchGate than on Mendeley. Especially, for the Mendeley reader counts that are less affected by the social component than the ResearchGate score. The ResearchGate score is strongly influenced by the social media component as Meier and Tunger (2018a) have shown.

    Typo on page 11, line 40/41: "... twitter platform ..."

    In the discussion of the Facebook results, Enkhbayar et al. (2020) should be taken into account.

    Bornmann, L. and Haunschild, R. (2018). Do altmetrics correlate with the quality of papers? A large-scale empirical study based on F1000Prime data. PLOS ONE, 13(5), e0197133. DOI: 10.1371/journal.pone.0197133
    Bornmann, L., Haunschild, R., and Adams, J. (2019). Do altmetrics assess societal impact in a comparable way to case studies? An empirical test of the convergent validity of altmetrics based on data from the UK research excellence framework (REF). Journal of Informetrics, 13(1), 325-340. DOI: 10.1016/j.joi.2019.01.008
    Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ, USA: Lawrence Erlbaum Associates.
    Enkhbayar A., Haustein S., Barata, G., and Alperin, J. P. (2020). How much research shared on Facebook happens outside of public pages and groups? A comparison of public and private online activity around PLOS ONE papers. Quantitative Science Studies, 1(2), 749-770. DOI: 10.1162/qss_a_00044
    Kassab, O., Bornmann, L., and Haunschild, R. (2020). Can altmetrics reflect societal impact considerations?: Exploring the potential of altmetrics in the context of a sustainability science research center. Quantitative Science Studies, 1(2), 792-809. DOI: 10.1162/qss_a_00032
    Kraemer, H. C., Morgan, G. A., Leech, N. L., Gliner, J. A., Vaske, J. J., & Harmon, R. J. (2003). Measures of clinical significance. Journal of the American Academy of Child and Adolescent Psychiatry, 42(12), 1524-1529.
    Meier, A. and Tunger, D. (2018a). Investigating the transparency and influenceability of altmetrics using the example of the RG score and the ResearchGate platform. Information Services & Use, 38(1-2), 99-110, https://content.iospress.com/articles/information-services-and-use/isu180001
    Meier, A. and Tunger, D. (2018b). Survey on opinions and usage patterns for the ResearchGate platform. PLoS One, 13(10), e0204945. DOI: 10.1371/journal.pone.0204945
    Wang, Z., Glaenzel, W., and Chen, Y. (2020). The impact of preprints in Library and Information Science: an analysis of citations, usage and social attention indicators, Scientometrics, DOI: 10.1007/s11192-020-03612-4

    Additional Questions:
    Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: OK

    Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: Should be improved, see comments to the authors

    Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: Should be improved, see comments to the authors

    Results: For empirical papers - are results presented clearly and analysed appropriately?: Should be improved, see comments to the authors

    Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: Should be improved, see comments to the authors

    Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: OK

    Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: Should be improved, see comments to the authors

    Reproducible Research: If appropriate, is sufficient information, potentially including data and software, provided to reproduce the results and are the corresponding datasets formally cited?: Problematic due to usage of proprietary data sets.

    Decision letter by
    Cite this decision letter
    Reviewer report
    2020/08/23

    The manuscript has improved in the previous round of revision. However, there are some remaining things to fix.

    Regarding the neglect of papers without altmetrics activity, I think just because mistakes were made, they should not be made over and over again. Others, e.g., Wang, Glaenzel, and Chen (2020), do it right: "... for papers without Mendeley readership or Twitter posts, readers and the tweets counts were put zero." (p. 6) I could find and present many more examples that employ this common practice. Mainly, the reason for no altmetrics activity is no activity rather than lack of coverage. For example, Altmetric.com screens paper mentions on Twitter. They might miss some papers or papers mentioned without DOI, link or PubMedID. This should not occur that often. Therefore, lack of coverage is very unlikely. If papers not found at Altmetric.com are not included in the analysis, a bias towards mentioned papers in the analysis has to be expected. Such a bias should be discussed in the limitations of the study.

    I am confused about the methodology. On page 7 lines 30-36, the authors write about which publications were found and which publications were not. However, the explanation is confusing. Please rewrite clearer what has been done.

    On page 8, lines 3-10, the authors provide rather rough guidelines for interpreting correlation coefficients without proper reasoning. Cohen (1988) and Kraemer et al. (2003) provided well-reasoned rules for interpretation of correlation coefficients.

    Considering the high relevance of ResearchGate regarding the (revised) manuscript, the problems associated with the ResearchGate score should be discussed carefully, see for example Meier and Tunger (2018a) and Meier and Tunger (2018b).

    In general, the introduction is rather short. Large data sets were analyzed by, e.g., Bornmann and Haunschild (2018), Bornmann, Haunschild, and Adams (2019), and Kassab, Bornmann, and Haunschild (2020) regarding the correlation between quality of papers and altmetrics as well as citations and altmetrics. Such previous studies should be discussed in the context of the present study.

    It is confusing to refer to Mendeley readers or bookmarks as mentions (as on page 9, lines 9-11).

    Page 9, lines 48-50: "The SRCC values for Facebook mentions and citations of the articles in different disciplinary groups is shown in Table 4." --> "The SRCC values for ... are shown ... ."

    In some instances, the authors mention that the used altmetrics data are from 1 to 1 1/2 years after publication, e.g., page 11, lines 46/47. The "1/2" is written in superscript and seems as if it should be an exponent notation which does not make sense for 1 1/2. Writing the "1/2" as usual script or the 1 1/2 as 1.5 would be much better.

    Page 10, lines 46-49: "This was done mainly to see if altmetrics correlate with later citations, and hence whether altmetrics can be used to predict later citations." Simple correlation results are not sufficient as a test "whether altmetrics can be used to predict later citations." This is repeated similarly on page 12, lines 47-49. One should be careful when drawing conclusions from correlations.

    Page 11, lines 7-9: "ResearchGate has a good mix of social media as well as bibliographical features, which makes this platform quite comprehensive academic social network (Yu et al., 2016)." What is the source of the bibliographic data that ResearchGate uses? This would be interesting to know. If it is unknown, this should be stated.

    Missing whitespace on page 11, lines 17/18: "... ResearchGate platform ishigher ..."

    On page 11, lines 25-32, the authors try to explain the lower correlation between ResearchGate data and Mendeley data with the dual nature of ResearchGate as a "platform with a mix of social media and bibliographic features". However, this is also true for Mendeley. One might argue that the social media component is stronger on ResearchGate than on Mendeley. Especially, for the Mendeley reader counts that are less affected by the social component than the ResearchGate score. The ResearchGate score is strongly influenced by the social media component as Meier and Tunger (2018a) have shown.

    Typo on page 11, line 40/41: "... twitter platform ..."

    In the discussion of the Facebook results, Enkhbayar et al. (2020) should be taken into account.

    Bornmann, L. and Haunschild, R. (2018). Do altmetrics correlate with the quality of papers? A large-scale empirical study based on F1000Prime data. PLOS ONE, 13(5), e0197133. DOI: 10.1371/journal.pone.0197133
    Bornmann, L., Haunschild, R., and Adams, J. (2019). Do altmetrics assess societal impact in a comparable way to case studies? An empirical test of the convergent validity of altmetrics based on data from the UK research excellence framework (REF). Journal of Informetrics, 13(1), 325-340. DOI: 10.1016/j.joi.2019.01.008
    Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ, USA: Lawrence Erlbaum Associates.
    Enkhbayar A., Haustein S., Barata, G., and Alperin, J. P. (2020). How much research shared on Facebook happens outside of public pages and groups? A comparison of public and private online activity around PLOS ONE papers. Quantitative Science Studies, 1(2), 749-770. DOI: 10.1162/qss_a_00044
    Kassab, O., Bornmann, L., and Haunschild, R. (2020). Can altmetrics reflect societal impact considerations?: Exploring the potential of altmetrics in the context of a sustainability science research center. Quantitative Science Studies, 1(2), 792-809. DOI: 10.1162/qss_a_00032
    Kraemer, H. C., Morgan, G. A., Leech, N. L., Gliner, J. A., Vaske, J. J., & Harmon, R. J. (2003). Measures of clinical significance. Journal of the American Academy of Child and Adolescent Psychiatry, 42(12), 1524-1529.
    Meier, A. and Tunger, D. (2018a). Investigating the transparency and influenceability of altmetrics using the example of the RG score and the ResearchGate platform. Information Services & Use, 38(1-2), 99-110, https://content.iospress.com/articles/information-services-and-use/isu180001
    Meier, A. and Tunger, D. (2018b). Survey on opinions and usage patterns for the ResearchGate platform. PLoS One, 13(10), e0204945. DOI: 10.1371/journal.pone.0204945
    Wang, Z., Glaenzel, W., and Chen, Y. (2020). The impact of preprints in Library and Information Science: an analysis of citations, usage and social attention indicators, Scientometrics, DOI: 10.1007/s11192-020-03612-4

    Reviewed by
    Cite this review
    Reviewer report
    2020/08/01

    Thank you for making these changes. I think that the paper is ready to be published now.

    Reviewed by
    Cite this review
    Author Response
    2020/07/12

    Response to Review Comments

    We are very thankful to the learned reviewers for the comments and suggestions to improve the manuscript. We have revised the manuscript further to address the comments. A summary of major changes and detailed response to review comments is as below.

    Summary of Major Changes:

    1. The complete analysis is done afresh, and all the results are computed again.
    2. The analysis now includes only publication records of document types “article” and “review” and all other document types like ‘editorial’, ‘note’ etc. are removed from analysis.
    3. The altmetric data for publication record from both, the ResearchGate platform and the Altmetric.com aggregator, are now limited to cut off date of 30th Sep. 2017.
    4. The correlation results for different platforms are now presented in discipline-wise groups. All result tables are accordingly modified.
    5. Table 1 is updated and presents all relevant information about the data, with data source and relevant dates, mentioned at one place.
    6. Language and grammar related issues are checked again, and all errors identified are corrected.

    Response to Reviewer 1:

    Comment 1: This new version is substantially improved over the previous version, but there are still some problems.
    Response: We thank the learned reviewer for valuable suggestions and comments.

    Comment 2: The novelty claim in the abstract, highlights and main body of the paper needs to be corrected because, as you report in the paper previous papers have used the same approach (correlated early altmetrics and later citations for the same set of papers), so your novelty is not what is currently stated but (a) using a longer citation window than before and (b) including ResearchGate.
    Response: There are just two recent studies (Thelwall, 2018; Thelwall and Nevill, 2018) that analyzed correlations between altmetric and citations, with data drawn from different time periods. However, as pointed out in the draft, they used citations for a smaller (and perhaps insufficient) time window and did not analyze data from ResearchGate (which is also being heavily used as an alternative open access source for full text). We have made slight modifications in Abstract and body of the article to correctly reflect the main focus of the article. The Highlights section is removed to maintain the word limit.

    Comment 3: Please change “are now found to present useful insight about” to “have been investigated for their potential to give insights about”.
    Response: We thank the learned reviewer for the suggestion and have made the change accordingly.

    Comment 4: Methods: The inclusion of document types other than “journal article” is a mistake because minor document types are unlikely to be mentioned or cited, artificially inflating the results. Please can you redo the tests with only documents of type journal article in WoS. Apart from this, I like the way that you have collected the data.
    Response: We understand and agree with the point and have accordingly removed all such document types, leaving the analysis restricted to document types ‘article’ and ‘review’ only. The results are accordingly computed afresh.

    Comment 5: Methodology: The first analysis should be completely deleted from the paper and not mentioned anywhere please. It is not meaningful to mix disciplines for correlation analyses because the average values for altmetrics and citations vary between fields, and not always in the same way.
    Response: We have changed all result tables to now show the correlation results for different platforms in discipline-wise groups.

    Comment 6: Methodology: I am confused about the exact dates represented by the altmetric data and reported in the analysis. I am confused because you report, “The altmetric data for three social media platforms, namely Twitter, Facebook and Blog, was collected for the matching publication records. This data was collected in May 2018.” And that the citations were collected in February 2020 but claim in the abstract to compare “early altmetric data (just after publication) and later citations (after 3-4 years of publication)” for articles from 2016. So you seem to be comparing 2 year window altmetrics with 4 year window citations? Please could give a clear specification of the relevant dates in one place and make clear the relationship between them. Please also add collection dates to all tables and figures (and the publication year 2016) to clarify this issue because it is key to your novelty claim.
    Response: We regret the lack of clarity of explanation. We have done following: (a) restricted both ResearchGate and Altmetric.com data till 30th Sep. 2017, and (b) put all relevant numbers and dates together at one place in table 1.
    Now the altmetric mentions for publication records for year 2016 are collected only up to 30th Sep. 2017. Thus, altmetric data period for publication records vary from 9 months (for article published in Dec. 2016) to 20 months (for articles published in Jan. 2016). It was not possible to limit the altmetric data for each individual article to 12 months period, since publication date information is not available in Web of Science records. Thus, the altmetric data for articles is collected for an average period of 15 months, which should be appropriate to capture most of the altmetric events.
    In this regard, it may be noted that a recent study on velocity of altmetrics (Fang and Costas, 2020) showed that about 90% altmetric mentions for most of the platforms accumulate in 10-12 months of publication of an article. Ortega, (2018) has also shown that Twitter and blog mentions accumulate quite quickly, usually faster than other kinds of social media mentions.

    Comment 7: Your conclusions should focus on what is novel from your findings, ignoring issues that have been previously shown, such as positive correlations between altmetrics and citations. To do this it will need to be much more specific than it currently is and will need to mention its novel parts (ResearchGate correlations, longer citation window). I like the point in the conclusion about the popularity of ResearchGate in India.
    Response: Thank you for the valuable point. We have revised the conclusion section and made it more focused and to the point. The part on limitations of the current study is now moved to Discussion section.

    Response to Reviewer 2:

    Comment 1: The manuscript has been improved in this first round of review. However, some more work has to be done before I can recommend publication of the manuscript.
    Response: We thank the learned reviewer for valuable suggestions and comments.

    Comment 2: First of all, careful proof-reading of the manuscript is necessary. As the authors tried and failed to properly proof-read the manuscript, I recommend usage of a professional service. Even one of the examples I mentioned in my previous review has not been fixed (page 2, lines 12/13: "It is probably first study ..."). Many more language problems remain, see for example:
    - Page 1, line 30: "... later citation counts for different sample of articles ..."
    - Page 1, line 44: "... may have an advantage in terms attracting higher citations in future."
    - Capitalization should be checked, especially for "Blog", "Spearman Rank Correlation Coefficients", and "Pearson Correlation".
    - Page 8, line 27: "Higher are the values (positive or negative), stronger would be the degree of correlation (positive or negative)."
    - Page 12, line 55: "... metanalysis by Bornann ..."
    Response: We sincerely regret the language and grammar issues. We have tried our best to make the manuscript clean in terms of grammar and language. The revised version has been seen by us and a colleague multiple times and we hope that all such issues are now corrected.

    Comment 3: There are some problems with the methodology: a) Papers not found at ResearchGate or Altmetric.com are removed in the respective sets. It is customary practice in altmetrics to assign a reader or mention value of zero to such papers. Otherwise, one should also remove uncited papers.
    b) This can be solved by including papers not found at ResearchGate or Altmetric.com with a score of zero. The authors compare the top-1000 papers ordered by citations, ResearchGate reads, and altmetrics mentions. As these three sets are of unequal size, this seems like comparing apples and oranges. In the case of citations, 76,621 papers are included. Thus, 1000 papers are the top-1.3%. In the case of ResearchGate reads, 42,872 papers are included. Thus, 1000 papers are the top-2.3%. In the case of tweets, Facebook mentions, and blog mentions, 17,414, papers are included. Thus, 1000 papers are the top-5.7%.
    Response: We thank the learned reviewer for pointing this out. In fact, we have used the same strategy, as suggested, for papers for which data is obtained from Altmetric.com. All the 17,317 papers do not have non-zero mentions in all the three platforms- Twitter, Blog and Facebook. For all papers that do not have a mention in a platform captured, we put the values as ‘0’. However, we did not do this ‘0’ substitution for the whole data of 70,655 papers from Web of Science, as it would account to unknown errors and biases due to lack of coverage by Altmetric.com. Several previous studies on citations and altmetric mentions used this approach (see for example Costas et al., 2015; Ortega, 2016; Thelwall, 2018; Thelwall and Kousha, 2017; Thelwall and Nevill, 2018). Thelwall & Nevill (2018) have also categorically stated that they have dropped all such DOIs for which altmetric evidence was not available. As far as citation counts are concerned, we have computed correlation between the mentions and citations for the same sample of papers which are found indexed in Altmetric.com, irrespective of whether they have zero or non-zero mentions. Regarding varying sizes of top 1000 papers for ResearchGate and Altmetric.com, we understand the point and have removed those results altogether for analysis. This was done also due to the fact that such grouping was problematic from the perspective of putting data from different disciplines together in one group. Given the clearly observed disciplinary differences in the degree of correlations, it was desirable to keep the data organized in respective disciplinary groups.

    Comment 4: The authors mentioned on page 7 that a random time delay between 90 and 180 seconds is necessary between two fetches. How many papers can be retrieved per fetch, only one?
    Response: Each fetch retrieves relevant information of a single paper from ResearchGate.

    Comment 5: The authors also mentioned on page 7 that the ResearchGate data were gathered between May and September 2017 and altmetrics were collected during 2017/2018. The ResearchGate data were gathered over a period of five months and the other altmetrics over the course of two years. This weakens the point of the authors that ResearchGate data and altmetrics were gathered just after publication. First, the papers were published in 2016. Thus, the altmetrics were gathered 1-2 years after publication and the citation data were gathered 2-3 years after the altmetrics data.
    Response: We understand the point and have now restricted that the altmetric data for publication records for the year 2016 are collected only up to 30th Sep. 2017. Thus, both Altmetric.com data and ResearchGate data are for the same time periods. It would have been ideal to collect altmetric data for each individual article for a time window of 12 months, however, it was not possible due to non-availability of publication date information in Web of Science records.
    The altmetric data period for the publication records thus vary from 9 months (for article published in Dec. 2016) to 20 months (for articles published in Jan. 2016). Assuming approximately equal number of articles are published each month in 2016, this averages to 15 months. Given that a recent study (Fang and Costas, 2020) has shown that about 90% altmetric mentions accumulate in 10-12 months of publication of an article, an average period of 15 months may be appropriate period to capture most of the altmetric events. The citation data is collected up to Feb. 2020, with approximate citation window of about 4 years.

    Comment 6: On page 9, lines 46-59, the authors discuss the different Spearman rank correlation coefficients. The correlation coefficient of 0.226 of the 1000 top-cited papers is compared to the correlation coefficient of the entire data set (0.473) as "a bit lower". It is less than half of the correlation coefficient of the entire data set. On page 13, lines 30-35, the authors discuss the results of the 1000 most-mentioned and the 1000 most-cited papers in comparison with the whole dataset. They infer "that those articles that get enough social media attention immediately after publication do get a citation advantage." An alternative interpretation might be that some articles are of such a high quality that they are within the top-1000 mentioned and top-1000 cited groups. The problem with the author's interpretation is that they seem to assume causality although they only investigated correlations. It is unknown if these papers got highly cited because they were frequently mentioned earlier or if both observations can pass independently from each other.
    Response: We understand the point. Now the analysis of these two cross-disciplinary groups are removed from the paper, for reasons described above. Further, we have now revised the writing at appropriate plac to clearly indicate that we are only trying to observe correlations/ associations and not the causality.

    Comment 7: The last paragraph of the conclusions section reads more like a discussion of the results.
    Response: Thank you for the suggestion. We have revised the conclusion section to make it focused and to the point.

    Comment 8: Finally, the authors mentioned in their replies to my comment 11 that they have mentioned the limitation of the WoS subject categories Arts & Humanities and Mathematics, but I do not see that in the revised version of the manuscript.
    Response: We regret having missed to illustrate this fact in the earlier revision. We have now clearly specified the limitations of altmetric mentions as far as they relate to subject areas like Mathematics. Research in Arts and Humanities subject area is usually found to get altmetric attention, with about 27% papers getting some altmetric coverage, which is higher than disciplines like Engineering (19.4%), Mathematics (22%) (Banshal et al., 2019). However, given the lower publication volume in Arts and Humanities, it only constitutes about 2% of the total altmetric mentions in all subject areas (with Mathematics and Engineering contributing 1.7% and 2.4%, respectively of the total altmetric mentions). We have now mentioned it as a point of caution while interpreting results.

    References:

    Costas, R., Zahedi, Z. and Wouters, P. (2015), “Do ‘altmetrics’ correlate with citations? Extensive comparison of altmetric indicators with citations from a multidisciplinary perspective”, Journal of the Association for Information Science and Technology, Vol. 66 No. 10, pp. 2003–2019.
    Banshal, S. K., Singh, V. K., Muhuri, P. K., & Mayr, P. (2019). Disciplinary Variations in Altmetric Coverage of Scholarly Articles. In 17th INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI), 1870–1881.
    Fang, Z. and Costas, R. (2020), “Studying the accumulation velocity of altmetric data tracked by Altmetric.com”, Scientometrics, Springer International Publishing, Vol. 123 No. 2, pp. 1077–1101.
    Ortega, J.L. (2016), “To be or not to be on Twitter, and its relationship with the tweeting and citation of research papers”, Scientometrics, Vol. 109 No. 2, pp. 1353–1364.
    Ortega, J.L. (2018), “The life cycle of altmetric impact: A longitudinal study of six metrics from PlumX”, Journal of Informetrics, Elsevier Ltd, Vol. 12 No. 3, pp. 579–589.
    Thelwall, M. (2018), “Early Mendeley readers correlate with later citation counts”, Scientometrics, Springer Netherlands, Vol. 115 No. 3, pp. 1231–1240.
    Thelwall, M. and Kousha, K. (2017), “ResearchGate versus Google Scholar: Which finds more early citations?”, Scientometrics, Vol. 112 No. 2, pp. 1125–1131.
    Thelwall, M. and Nevill, T. (2018), “Could scientists use Altmetric . com scores to predict longer term citation counts ?”, Journal of Informetrics, Vol. 12 No. 1, pp. 237–248.



    Cite this author response
  • pre-publication peer review (ROUND 2)
    Decision Letter
    2020/05/30

    30-May-2020

    Dear Prof. Singh,

    Manuscript ID OIR-11-2019-0364.R1 entitled "Can early altmetric mentions predict later citations? A test of validity on data from ResearchGate and three social media platforms" which you submitted to Online Information Review, has been reviewed. The comments of the reviewer(s) are included at the bottom of this letter.

    The reviewers have recommended that you make major revisions to your manuscript prior to it being considered for publication. Please read their suggestions and prepare a revised manuscript. Any changes that you make to your manuscipt should be highlighted, as well as described in your response to reviewers.

    Please also ensure that in doing so your paper does not exceed the maximum word length of 10000 words and that it meets all the requirements of the author guidelines at http://www.emeraldinsight.com/products/journals/author_guidelines.htm?id=oir&PHPSESSID;=ubl727mru90lg3hc8sa5p5qrt2."

    To revise your manuscript, log into https://mc.manuscriptcentral.com/oir and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions." Under "Actions," click on "Create a Revision." Your manuscript number has been appended to denote a revision.

    You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript using a word processing program and save it on your computer. Please also highlight the changes to your manuscript within the document by using the track changes mode in MS Word or by using bold or coloured text.

    Once the revised manuscript is prepared, you can upload it and submit it through your Author Centre.

    When submitting your revised manuscript, you will be able to respond to the comments made by the reviewer(s) in the space provided. You can use this space to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the reviewer(s).

    IMPORTANT: Your original files are available to you when you upload your revised manuscript. Please delete any redundant files before completing the submission.

    Because we are trying to facilitate timely publication of manuscripts submitted to Online Information Review, your revised manuscript should be uploaded as soon as possible. If it is not possible for you to submit your revision in a reasonable amount of time, we may have to consider your paper as a new submission.

    Once again, thank you for submitting your manuscript to Online Information Review and I look forward to receiving your revision.

    Yours sincerely,
    To help support you on your publishing journey we have partnered with Editage, a leading global science communication platform, to offer expert editorial support including language editing and translation.
    If your article has been rejected or revisions have been requested, you may benefit from Editage’s services. For a full list of services, visit: authorservices.emeraldpublishing.com/
    Please note that there is no obligation to use Editage and using this service does not guarantee publication.

    Dr. Eugenia Siapera
    Co-Editor
    eugenia.siapera@ucd.ie

    Reviewer(s)' Comments to Author:
    Reviewer: 1

    Recommendation: Major Revision

    Comments:
    This new version is substantially improved over the previous version, but there are still some problems.

    The novelty claim in the abstract, highlights and main body of the paper needs to be corrected because, as you report in the paper previous papers have used the same approach (correlated early altmetrics and later citations for the same set of papers), so your novelty is not what is currently stated but (a) using a longer citation window than before and (b) including ResearchGate.
    Please change “are now found to present useful insight about” to “have been investigated for their potential to give insights about”.
    Methods:
    The inclusion of document types other than “journal article” is a mistake because minor document types are unlikely to be mentioned or cited, artificially inflating the results. Please can you redo the tests with only documents of type journal article in WoS. Apart from this, I like the way that you have collected the data.

    Methodology:
    The first analysis should be completely deleted from the paper and not mentioned anywhere please. It is not meaningful to mix disciplines for correlation analyses because the average values for altmetrics and citations vary between fields, and not always in the same way.
    I am confused about the exact dates represented by the altmetric data and reported in the analysis. I am confused because you report, “The altmetric data for three social media platforms, namely Twitter, Facebook and Blog, was collected for the matching publication records. This data was collected in May 2018.” And that the citations were collected in February 2020 but claim in the abstract to compare “early altmetric data (just after publication) and later citations (after 3-4 years of publication)” for articles from 2016. So you seem to be comparing 2 year window altmetrics with 4 year window citations? Please could give a clear specification of the relevant dates in one place and make clear the relationship between them. Please also add collection dates to all tables and figures (and the publication year 2016) to clarify this issue because it is key to your novelty claim.

    Your conclusions should focus on what is novel from your findings, ignoring issues that have been previously shown, such as positive correlations between altmetrics and citations. To do this it will need to be much more specific than it currently is and will need to mention its novel parts (ResearchGate correlations, longer citation window).
    I like the point in the conclusion about the popularity of ResearchGate in India.

    Additional Questions:
    Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: Yes

    Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: Yes

    Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: Yes except for two problems.

    Results: For empirical papers - are results presented clearly and analysed appropriately?: Yes except for two problems.

    Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: Yes, reasonably OK for this.

    Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: To some extent.

    Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: Not very readable but structure OK.

    Reproducible Research: If appropriate, is sufficient information, potentially including data and software, provided to reproduce the results and are the corresponding datasets formally cited?: Methods described clearly.

    Reviewer: 2

    Recommendation: Major Revision

    Comments:
    The manuscript has been improved in this first round of review. However, some more work has to be done before I can recommend publication of the manuscript.

    First of all, careful proof-reading of the manuscript is necessary. As the authors tried and failed to properly proof-read the manuscript, I recommend usage of a professional service. Even one of the examples I mentioned in my previous review has not been fixed (page 2, lines 12/13: "It is probably first study ..."). Many more language problems remain, see for example:
    - Page 1, line 30: "... later citation counts for different sample of articles ..."
    - Page 1, line 44: "... may have an advantage in terms attracting higher citations in future."
    - Capitalization should be checked, especially for "Blog", "Spearman Rank Correlation Coefficients", and "Pearson Correlation".
    - Page 8, line 27: "Higher are the values (positive or negative), stronger would be the degree of correlation (positive or negative)."
    - Page 12, line 55: "... metanalysis by Bornann ..."

    There are some problems with the methodology:

    a) Papers not found at ResearchGate or Altmetric.com are removed in the respective sets. It is customary practice in altmetrics to assign a reader or mention value of zero to such papers. Otherwise, one should also remove uncited papers.

    b) This can be solved by including papers not found at ResearchGate or Altmetric.com with a score of zero. The authors compare the top-1000 papers ordered by citations, ResearchGate reads, and altmetrics mentions. As these three sets are of unequal size, this seems like comparing apples and oranges. In the case of citations, 76,621 papers are included. Thus, 1000 papers are the top-1.3%. In the case of ResearchGate reads, 42,872 papers are included. Thus, 1000 papers are the top-2.3%. In the case of tweets, Facebook mentions, and blog mentions, 17,414, papers are included. Thus, 1000 papers are the top-5.7%.

    c) The authors mentioned on page 7 that a random time delay between 90 and 180 seconds is necessary between two fetches. How many papers can be retrieved per fetch, only one?

    d) The authors also mentioned on page 7 that the ResearchGate data were gathered between May and September 2017 and altmetrics were collected during 2017/2018. The ResearchGate data were gathered over a period of five months and the other altmetrics over the course of two years. This weakens the point of the authors that ResearchGate data and altmetrics were gathered just after publication. First, the papers were published in 2016. Thus, the altmetrics were gathered 1-2 years after publication and the citation data were gathered 2-3 years after the altmetrics data.

    On page 9, lines 46-59, the authors discuss the different Spearman rank correlation coefficients. The correlation coefficient of 0.226 of the 1000 top-cited papers is compared to the correlation coefficient of the entire data set (0.473) as "a bit lower". It is less than half of the correlation coefficient of the entire data set.

    On page 13, lines 30-35, the authors discuss the results of the 1000 most-mentioned and the 1000 most-cited papers in comparison with the whole dataset. They infer "that those articles that get enough social media attention immediately after publication do get t a citation advantage." An alternative interpretation might be that some articles are of such a high quality that they are within the top-1000 mentioned and top-1000 cited groups. The problem with the author's interpretation is that they seem to assume causality although they only investigated correlations. It is unknown if these papers got highly cited because they were frequently mentioned earlier or if both observations can to pass independently from each other.

    The last paragraph of the conclusions section reads more like a discussion of the results.

    Finally, the authors mentioned in their replies to my comment 11 that they have mentioned the limitation of the WoS subject categories Arts & Humanities and Mathematics, but I do not see that in the revised version of the manuscript.

    Additional Questions:
    Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: Yes.

    Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: Yes.

    Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: There are remaining methodological problems as outlined in the comments to the authors.

    Results: For empirical papers - are results presented clearly and analysed appropriately?: Yes.

    Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: Could be improved.

    Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: Yes.

    Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: Could be improved.

    Reproducible Research: If appropriate, is sufficient information, potentially including data and software, provided to reproduce the results and are the corresponding datasets formally cited?: Could be improved.

    Decision letter by
    Cite this decision letter
    Reviewer report
    2020/05/24

    The manuscript has been improved in this first round of review. However, some more work has to be done before I can recommend publication of the manuscript.

    First of all, careful proof-reading of the manuscript is necessary. As the authors tried and failed to properly proof-read the manuscript, I recommend usage of a professional service. Even one of the examples I mentioned in my previous review has not been fixed (page 2, lines 12/13: "It is probably first study ..."). Many more language problems remain, see for example:
    - Page 1, line 30: "... later citation counts for different sample of articles ..."
    - Page 1, line 44: "... may have an advantage in terms attracting higher citations in future."
    - Capitalization should be checked, especially for "Blog", "Spearman Rank Correlation Coefficients", and "Pearson Correlation".
    - Page 8, line 27: "Higher are the values (positive or negative), stronger would be the degree of correlation (positive or negative)."
    - Page 12, line 55: "... metanalysis by Bornann ..."

    There are some problems with the methodology:

    a) Papers not found at ResearchGate or Altmetric.com are removed in the respective sets. It is customary practice in altmetrics to assign a reader or mention value of zero to such papers. Otherwise, one should also remove uncited papers.

    b) This can be solved by including papers not found at ResearchGate or Altmetric.com with a score of zero. The authors compare the top-1000 papers ordered by citations, ResearchGate reads, and altmetrics mentions. As these three sets are of unequal size, this seems like comparing apples and oranges. In the case of citations, 76,621 papers are included. Thus, 1000 papers are the top-1.3%. In the case of ResearchGate reads, 42,872 papers are included. Thus, 1000 papers are the top-2.3%. In the case of tweets, Facebook mentions, and blog mentions, 17,414, papers are included. Thus, 1000 papers are the top-5.7%.

    c) The authors mentioned on page 7 that a random time delay between 90 and 180 seconds is necessary between two fetches. How many papers can be retrieved per fetch, only one?

    d) The authors also mentioned on page 7 that the ResearchGate data were gathered between May and September 2017 and altmetrics were collected during 2017/2018. The ResearchGate data were gathered over a period of five months and the other altmetrics over the course of two years. This weakens the point of the authors that ResearchGate data and altmetrics were gathered just after publication. First, the papers were published in 2016. Thus, the altmetrics were gathered 1-2 years after publication and the citation data were gathered 2-3 years after the altmetrics data.

    On page 9, lines 46-59, the authors discuss the different Spearman rank correlation coefficients. The correlation coefficient of 0.226 of the 1000 top-cited papers is compared to the correlation coefficient of the entire data set (0.473) as "a bit lower". It is less than half of the correlation coefficient of the entire data set.

    On page 13, lines 30-35, the authors discuss the results of the 1000 most-mentioned and the 1000 most-cited papers in comparison with the whole dataset. They infer "that those articles that get enough social media attention immediately after publication do get t a citation advantage." An alternative interpretation might be that some articles are of such a high quality that they are within the top-1000 mentioned and top-1000 cited groups. The problem with the author's interpretation is that they seem to assume causality although they only investigated correlations. It is unknown if these papers got highly cited because they were frequently mentioned earlier or if both observations can to pass independently from each other.

    The last paragraph of the conclusions section reads more like a discussion of the results.

    Finally, the authors mentioned in their replies to my comment 11 that they have mentioned the limitation of the WoS subject categories Arts & Humanities and Mathematics, but I do not see that in the revised version of the manuscript.

    Reviewed by
    Cite this review
    Reviewer report
    2020/05/02

    This new version is substantially improved over the previous version, but there are still some problems.

    The novelty claim in the abstract, highlights and main body of the paper needs to be corrected because, as you report in the paper previous papers have used the same approach (correlated early altmetrics and later citations for the same set of papers), so your novelty is not what is currently stated but (a) using a longer citation window than before and (b) including ResearchGate.
    Please change “are now found to present useful insight about” to “have been investigated for their potential to give insights about”.
    Methods:
    The inclusion of document types other than “journal article” is a mistake because minor document types are unlikely to be mentioned or cited, artificially inflating the results. Please can you redo the tests with only documents of type journal article in WoS. Apart from this, I like the way that you have collected the data.

    Methodology:
    The first analysis should be completely deleted from the paper and not mentioned anywhere please. It is not meaningful to mix disciplines for correlation analyses because the average values for altmetrics and citations vary between fields, and not always in the same way.
    I am confused about the exact dates represented by the altmetric data and reported in the analysis. I am confused because you report, “The altmetric data for three social media platforms, namely Twitter, Facebook and Blog, was collected for the matching publication records. This data was collected in May 2018.” And that the citations were collected in February 2020 but claim in the abstract to compare “early altmetric data (just after publication) and later citations (after 3-4 years of publication)” for articles from 2016. So you seem to be comparing 2 year window altmetrics with 4 year window citations? Please could give a clear specification of the relevant dates in one place and make clear the relationship between them. Please also add collection dates to all tables and figures (and the publication year 2016) to clarify this issue because it is key to your novelty claim.

    Your conclusions should focus on what is novel from your findings, ignoring issues that have been previously shown, such as positive correlations between altmetrics and citations. To do this it will need to be much more specific than it currently is and will need to mention its novel parts (ResearchGate correlations, longer citation window).
    I like the point in the conclusion about the popularity of ResearchGate in India.

    Reviewed by
    Cite this review
    Author Response
    2020/04/21

    Response to Reviewer Comments/ Suggestions

    We are very thankful to the learned reviewers for the valued comments and suggestions to improve the manuscript titled “Can early altmetric mentions predict later citations? A test of validity on a large dataset”. We have tried our best to address all comments in an appropriate and meaningful way and incorporate the suggestions provided. A summary of major changes and detailed response to review comments is as below.

    Summary of Major Changes:

    1. The manuscript is significantly revised in terms of organization and composition of sections. Data and Methodology section is now divided into two independent sections, both expanded substantially. A new ‘Discussion’ section is added to discuss the results and their implications.
    2. The title is modified to reflect special focus of work on ResearchGate platform.
    3. The section on Literature review is significantly expanded to explain the related work in more detail.
    4. Spearman Rank Correlation is computed for different cases and included in the results, to replace the Pearson correlation values.
    5. Data and analysis for Mendeley and News platforms are removed, due to issues of inconsistencies in the data and some other reasons.
    6. The results are updated by doing all computations afresh. This was necessary after updating citations for the articles for the corresponding period.
    7. Language and grammar related issues and errors are addressed.

    Response to Reviewer 1:

    Comment 1: This article assesses whether early altmetric scores correlate with later citation counts for a set of 88,259 articles from India in 2016. The article is well structured, seems to have some novel data (ResearchGate, possibly a data set from India) and attempts to make a genuine research contribution.
    Response: We thank the learned reviewer for valuable time and providing valuable suggestions to further improve the manuscript.

    Comment 2: There are a few problems with the current version that need to be corrected. Please clarify in the abstract that the scope of the study is Indian publications from one year in the Web of Science.
    Response: We have clearly marked in the abstract section as well as Data section that the data used for analysis is for publications from India for the year 2016. This is also mentioned in Conclusion section as a limitation.

    Comment 3: The introduction should focus more narrowly on your research problem (early altmetrics as predictors of later citation counts) and make a claim for the usefulness of the results.
    Response: We have revised the Introduction section, made it more focused on specific topic being dealt in the current paper.

    Comment 4: It is good to see clear research questions but previous publications that you have reviewed have given a positive answer to both of them, so they are redundant. Please narrow your research questions to address specific aspects for which the answer is not known. This might be India and/or ResearchGate.
    Response: The previously published papers did address the research questions. However, there are two limitations observed in previous studies. First, they use much lesser data for analysis, and some use data for specific disciplines. Second, the previous studies do not actually collect early altmetric data and later citation counts. Instead, what they do is to collect altmetric and citation data for a set of publications at a same date. Therefore, they do not actually analyze the relationship of early altmetric mentions being predictor of later citation counts. Since, we mainly wanted to find out if early altmetric mentions can predict later citation counts, we have collected altmetric data at an early time and the citation data after 3-4 year of publication. The correlations between the two are then computed for data from the two time periods. Further, as pointed out by the learned reviewer, we have used ResearchGate data for the analysis, which possibly is first such analytical study on ResearchGate data. We have slightly revised the research questions to make them more appropriate to the study.

    Comment 5: The literature review should be more comprehensive and more in-depth. The insights that it gives you should help you formulate better research questions and employ more suitable methods.
    Response: We understand the point and have revised the Literature Review section significantly. It now includes more details on understanding the relationship between altmetrics and citations.

    Comment 6: Please give a justification for your research design. Why did you analyse the data in the way that you chose and how does it help to address the research questions?
    Response: We mainly wanted to find out if early altmetric mentions can predict later citation counts, we have collected altmetric data at an early time and the citation data after 3-4 year of publication of articles. The correlations between the two are then computed for data from two time periods. We have added more details in Methodology and Discussion sections for this.

    Comment 7: Please give a justification for all your methods choices. Why did you do it that way and why is it better than the alternatives?
    Response: We regret the omission of relevant description on the various available choices. We have now significantly revised the Methodology section clearly describing the alternatives available and the rationale for the choices made. Data and Methodology section are now divided into two independent sections for this purpose.

    Comment 8: Please give the exact method used to obtain the Web of Science sample, including the document types included. This should allow the reader to reproduce what you did and detect, for example, whether your dataset includes editorials and reviews.
    Response: Thank you for pointing out this issue. The Data section is now revised and expanded to include these details. Just to illustrate, publication records for all document types was used. However, most of the records are actually of type article.

    Comment 9: Altmetric.com’s Mendeley data is incomplete because they do not comprehensively index Mendeley (unless this has changed since I last checked) so please delete all references to Mendeley data.
    Response: We agree with the suggestion of the learned reviewer and have removed analysis of Mendeley data from results. We thought of taking Mendeley data directly through Mendeley API, however, since we wanted mentions for early data, which is not readily available, therefore, we decided to drop Mendeley data from analytical results altogether and focus more on ResearchGate data, which would be a new analysis.

    Comment 10: Please clarify what you did with articles without a matching record in Altmetric.com – did you give them a score of 0 for correlations or remove them?
    Response: We regret that we have not clearly described this. It is now clearly mentioned. To illustrate, all Web of Science records with DOI that are not found in Altmetric.com or ResearchGate could not be used for further analysis because the altmetric counts for such records was not available. Such records were accordingly not accounted for computing correlation value.

    Comment 11: The methods state that you were, “able to extract ResearchGate data for 53,832 publication records.”. What data did you extract from ResearchGate? It reports lots of different numbers, so which did you use? Please also give a brief explanation about how the crawler works and how you match DOIs.
    Response: We have written a crawler to collect following data from ResearchGate for each record from Web of Science: title, DOI, author information, event statistics (read, citations, comments, and recommendations), journal information etc. We have provided details of academic focused crawler used for data collection in Data section. It describes the crawlers’ functionality and data obtained.

    Comment 12: Pearson correlations should never be used for citation counts or altmetrics because they are highly skewed. Please use Spearman correlations instead.
    Response: We understand the point and the fact that altmetric data is skewed too. We have computed Spearman Rank Correlation and included it in results, replacing Pearson Correlation.

    Comment 13: Please only report statistics for fields and delete all the science-wide statistics (e.g., Tables 2,3,4). These are not helpful due to massive differences in average altmetrics and citation counts between fields.
    Response: We understand the point and agree that there are high variations in average values between different fields. This is now elaborated further in the Discipline-wise analysis part. We have now reorganized results to put them platform-wise to make it more useful.

    Comment 14: The paper Discussion section should relate the results to prior research and show how then fill gaps in it.
    Response: We have now divided Conclusion section in two parts: Discussion and Conclusion. The Discussion section ow discusses and relates the findings with previous studies and Conclusion section mentions the main takeaway of the work.

    Comment 15: The paper Conclusion section should form conclusions only from the findings of your paper rather than making general statements that are already widely known and could have been written (and have been) before your findings.
    Response: The Conclusion section is revised as per the suggestion of the learned reviewer. Limitations and pointers to future work are also included.

    Response to Reviewer 2:

    Comment 1: The manuscript "Can early altmetric mentions predict later citations? A test of validity on a large dataset" presents correlations between citation counts at a later point in time and mentions on Twitter, Facebook, news, Mendeley, and ResearchGate at an earlier point in time. Altmetrics data are about a year old and citation data are about three years old. This should account for the fact that most altmetrics accumulate faster than citations. However, do all altmetrics accumulate similarly fast? Why is exactly this time delay chosen? More discussion of this aspect is needed as this is mainly the novel part of the manuscript.
    Response: We thank the learned reviewer for valuable time spent in reviewing of the manuscript and providing valuable suggestions to improve the manuscript. It is rightly pointed out that altmetric data for publications is taken at early time (just after publication/ or going online) and citation count is taken at a later date (after 2 to 3 years of publication). This indeed is a novel thing done in the area. Earlier studies which tried to analyze this question actually used altmetric and citation data for the same time period. This would, however, not be suitable as such analysis could not produce evidence whether early altmetric mentions could be predictors of later citations. Since we mainly wanted to analyze whether altmetric counts accumulated just after publication correlate well with citations accrued ion 2-3 years’ time, we have taken data from different points in time. This is now more clearly indicated in Introduction as well as Related Work part.

    Comment 2: According to the methodology, only papers with an Indian affiliation published in 2016 were analyzed. The WoS query "cu=india and py=2016" results in 132,363 papers, 98,351 of them having a DOI. The authors reported that they found 88,259 papers, 76,621 of them having a DOI. Were additional restrictions imposed, e.g., to the document type? The restriction to papers with an Indian affiliation and publication year 2016 should be mentioned in the title and/or abstract. It should also be discussed as a limitation besides the restriction to papers with a DOI.
    Response: Thank you for pointing out this. We have now included this in abstract and have also clearly described the data collection process in the data section part. Limitations of using results out of this data are also mentioned in conclusion section.

    Comment 3: As far as I understand the methodology, all altmetrics (including Mendeley reader counts) are taken from Altmetric.com. This is a problem because Mendeley data from Altmetric.com are vastly incomplete, see for example the case study with the DOI 10.29024/joa.4. Mendeley data should be retrieved from the Mendeley API.
    Response: We agree with the suggestion of the learned reviewer and have removed analysis of Mendeley data from results altogether. We thought of taking Mendeley data directly through Mendeley API, however, since we wanted mentions for early data, which is not readily available, therefore, we decided to drop Mendeley data from analytical results.

    Comment 4: The data in Table 1 seem strange: the number of papers found in 2017 is always larger than in 2019. Why? Especially, it is very strange to see that WoS should show about 4000 papers less in 2019 than in 2017. The number of papers found on ResearchGate halved in these two years. Why? It is also surprising to see that the number of papers drops in Altmetric.com by about 5000 papers. Why? What does Stacy Konkiel has to say about this? Such discrepancies in the data set have to be explained better.
    Response: We understand the point raised by the learned reviewer. We were also surprised to observe this. Even Web of Science has shown drop in papers from India for 2016, as originally collected in 2017 and then later in 2019. Due to this drop in records, the altmetric data also reduced. These are some problems that are often encountered by many researchers in the area. However, to take a fresh look at this, we obtained all publication records and their citations afresh in Feb. 2020, keeping the altmetric data same as originally obtained for early period. Thus, correlation analysis was done for only those publication records for which altmetric data was found from ResearchGate or Altmetric.com. The Data section is now modified accordingly.

    Comment 5: How many papers contribute to the correlation analysis in Table 2?
    Response: We have recomputed the results. Now 17,414 papers are there for correlation computation in Altmetric.com data and 42,872 for correlation computation with ResearchGate data. These numbers are now updated at respective places.

    Comment 6: Page 2, lines 51-54: "Varied observations have been recorded in different studies, with some studies supporting the existence of correlations whereas few others stated that either correlations are not noticeable or very weak." I think that the studies that stated either correlations are not noticeable or very weak between citation counts and most altmetrics (except for Mendeley and CiteULike reader counts) are in the large majority. Look at the meta-analysis with DOI 10.1007/s11192-015-1565-y and the many studies included there. The results from this meta-analysis should also be discussed in the introduction.
    Response: We thank the learned reviewer for the input. This is now incorporated in the paper in introduction and related work sections.

    Comment 7: Page 3, line 28: "... highly cited journals ... ." What is meant with highly cited journals? Maybe, "journals with a high impact factor" is a better wording.
    Response: Thank you for pointing this. We used the words as it was described in the cited paper mentioned in Related work part. However, we agree with the point and have now changed it to journals with high impact factor.

    Comment 8: Page 5, lines 3-5: "The analysis of records from Altmetric.com was done mainly for four popular social platforms, namely Twitter, Facebook, Mendeley, and News Mentions." Twitter and Facebook definitively count as social platforms. The character of Mendeley as a social platform is debatable because it is also usable via a desktop application without using the online social part. I can't see why news sources should count as "social platforms".
    Response: We have removed analysis of Mendeley data from results due to this and other reasons, as detailed above. The analysis for News part is also deleted. Now the main focus is on ResearchGate data and comparing its results with three social media platforms- Twitter, Facebook and Blog.

    Comment 9: Page 6, lines 44-47: "The analysis of this data sample shows that highly cited papers are found to be mentioned higher than the average value for the whole dataset." Really? Does it show that? The authors presented correlation results. I do not see how correlation results provide information about averages. It might be the case that the statement is true but the justification is wrong. The authors have the data at hand and can calculate the average values and compare them.
    Response: We understand the point and have revised the text to make the key point to be made clearer. We observe that those papers that get higher altmetric attention may have a citation advantage and similarly it is also observed that papers that get higher citations are found to have relatively higher altmetric attention.

    Comment 10: Page 7, lines 32-35: "It is more evident from these results that papers with the highest social mentions are not among the highest cited papers." The sentence is probably correct when the "not" is replaced with "less often".
    Response: We regret the error. It now stands replaced with a more detailed revision of the description.

    Comment 11: Page 7, lines 48-53: "The data was tagged 14 broad disciplines and analytical results were obtained for each discipline for the five social platforms. The fourteen disciplines selected are as follows: Agriculture (AGR), Art & Humanities (AH), Biology (BIO), Chemistry (CHE), Engineering (ENG), Environment Science (ENV), Geology (GEO), Information Sciences (INF), Material Science (MAR), Mathematics (MAT) ... ." How were the data tagged into 14 broad disciplines? Especially, for Art & Humanities and Mathematics, it is very problematic to apply bibliometrics. The applicability of altmetrics is unknown. This should be mentioned.
    Response: The terminology of 14 broad disciplines used is a grouping of several related disciplines together and identifying it with a broader name. The grouping was done by utilizing the “WC” field of Web of Science. Each record is grouped into a broad disciplinary group based on the “WC” category it is assigned and the group that “WC” category is clubbed with. The disciplinary grouping process is now detailed further in the Methodology section. We agree about the point on disciplines like Arts & Humanities, Mathematics etc. It is now appropriately mentioned.

    Comment 12: The company Altmetric.com is spelled as altmetric.com which is wrong.
    Response: We regret the error and have corrected this in the revised manuscript.

    Comment 13: The manuscript should be proof-read before resubmission. Here are some examples:
    - Page 2, lines 12/13: "It is probably first study ..."
    - Page 2, lines 47/48: "... while some went a step ahead to find out of social media metrics ..."
    - Page 3, lines 13/14: "Do there exits disciplinary variations ..."
    - Page 3, lines 18/19: "Understanding the relationship between altmetrics and citation count ..."
    - Page 4, lines 28/29: "Since DOI filed was the linking data ..."
    - Page 5, line 50: "Another reason of slightly different picture ..."
    - Page 6, lines 51-53: "It, however, be also interesting ..."
    - Page 8, lines 23/24: "It is observed that there are significant variations in correlations values ..."
    - Page 8, lines 49-51: "... by analyzing a large-sized data samples comprising of different disciplines."
    - Page 8, lines 54-56: "Taking data from different time periods make the analysis more credible ..."
    - Page 9, lines 15-17: "... correlation values between citation and altmetrics ..."
    Response: We regret the errors in language and grammar. All these now stand corrected. The manuscript is thoroughly revised to remove language and grammar related errors.



    Cite this author response
  • pre-publication peer review (ROUND 1)
    Decision Letter
    2020/02/13

    &PHPSESSID13-Feb-2020;

    Dear Prof. Singh,

    Manuscript ID OIR-11-2019-0364 entitled "Can early altmetric mentions predict later citations? A test of validity on a large dataset" which you submitted to Online Information Review has been reviewed. The comments of the reviewer(s) are included at the bottom of this letter.

    The reviewers have recommended that you make major revisions to your manuscript prior to it being considered for publication.

    Please read their suggestions and if you choose to prepare a revised manuscript ensure that any changes that you make to your manuscript are highlighted, as well as described in your response to reviewers.

    Please also ensure that in doing so your paper does not exceed the maximum word length of 10000 words and that it meets all the requirements of the author guidelines at http://www.emeraldinsight.com/products/journals/author_guidelines.htm?id=oir=ubl727mru90lg3hc8sa5p5qrt2."

    To revise your manuscript log into https://mc.manuscriptcentral.com/oir and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions". Under "Actions" click on "Create a Revision". Your manuscript number has been appended to denote a revision.

    You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript using a word processing program and save it on your computer. Please also highlight the changes to your manuscript within the document by using the track changes mode in MS Word or by using bold or coloured text.

    Once the revised manuscript is prepared you can upload it and submit it through your Author Centre.

    When submitting your revised manuscript, you will be able to respond to the comments made by the reviewer(s) in the space provided. You can use this space to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the reviewer(s).

    IMPORTANT: Your original files are available to you when you upload your revised manuscript. Please delete any redundant files before completing the submission.

    Because we are trying to facilitate timely publication of manuscripts submitted to Online Information Review, your revised manuscript should be uploaded as soon as possible. If it is not possible for you to submit your revision in a reasonable amount of time, we may have to consider your paper as a new submission.

    Once again, thank you for submitting your manuscript to Online Information Review. I look forward to receiving your revision.

    Yours sincerely,

    Dr. Eugenia Siapera
    eugenia.siapera@ucd.ie

    Reviewer(s)' Comments to Author:
    Reviewer: 1

    Recommendation: Major Revision

    Comments:
    This article assesses whether early altmetric scores correlate with later citation counts for a set of 88,259 articles from India in 2016. The article is well structured, seems to have some novel data (ResearchGate, possibly a data set from India) and attempts to make a genuine research contribution. There are a few problems with the current version that need to be corrected.
    Please clarify in the abstract that the scope of the study is Indian publications from one year in the Web of Science.
    The introduction should focus more narrowly on your research problem (early altmetrics as predictors of later citation counts) and make a claim for the usefulness of the results.
    It is good to see clear research questions but previous publications that you have reviewed have given a positive answer to both of them, so they are redundant. Please narrow your research questions to address specific aspects for which the answer is not known. This might be India and/or ResearchGate.
    The literature review should be more comprehensive and more in-depth. The insights that it gives you should help you formulate better research questions and employ more suitable methods.
    Please give a justification for your research design. Why did you analyse the data in the way that you chose and how does it help to address the research questions?
    Please give a justification for all your methods choices. Why did you do it that way and why is it better than the alternatives?
    Please give the exact method used to obtain the Web of Science sample, including the document types included. This should allow the reader to reproduce what you did and detect, for example, whether your dataset includes editorials and reviews.
    Altmetric.com’s Mendeley data is incomplete because they do not comprehensively index Mendeley (unless this has changed since I last checked) so please delete all references to Mendeley data.
    Please clarify what you did with articles without a matching record in Altmetric.com – did you give them a score of 0 for correlations or remove them?
    The methods state that you were, “able to extract ResearchGate data for 53,832 publication records.”. What data did you extract from ReseachGate? It reports lots of different numbers, so which did you use? Please also give a brief explanation about how the crawler works and how you match DOIs.

    Pearson correlations should never be used for citation counts or altmetrics because they are highly skewed. Please use Spearman correlations instead.
    Please only report statistics for fields and delete all the science-wide statistics (e.g., Tables 2,3,4). These are not helpful due to massive differences in average altmetrics and citation counts between fields.
    The paper Discussion section should relate the results to prior research and show how then fill gaps in it.
    The paper Conclusion section should form conclusions only from the findings of your paper rather than making general statements that are already widely known and could have been written (and have been) before your findings.

    Additional Questions:
    Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: Not clear yet, but possibly.

    Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: A little bit.

    Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: No

    Results: For empirical papers - are results presented clearly and analysed appropriately?: OK

    Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: No

    Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: No relevant conclusions

    Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: No

    Reproducible Research: If appropriate, is sufficient information, potentially including data and software, provided to reproduce the results and are the corresponding datasets formally cited?: No

    Reviewer: 2

    Recommendation: Major Revision

    Comments:
    The manuscript "Can early altmetric mentions predict later citations? A test of validity on a large dataset" presents correlations between citation counts at a later point in time and mentions on Twitter, Facebook, news, Mendeley, and ResearchGate at an earlier point in time. Altmetrics data are about a year old and citation data are about three years old. This should account for the fact that most altmetrics accumulate faster than citations. However, do all altmetrics accumulate similarly fast? Why is exactly this time delay chosen? More discussion of this aspect is needed as this is mainly the novel part of the manuscript.

    According to the methodology, only papers with an Indian affiliation published in 2016 were analyzed. The WoS query "cu=india and py=2016" results in 132,363 papers, 98,351 of them having a DOI. The authors reported that they found 88,259 papers, 76,621 of them having a DOI. Were additional restrictions imposed, e.g., to the document type? The restriction to papers with an Indian affiliation and publication year 2016 should be mentioned in the title and/or abstract. It should also be discussed as a limitation besides the restriction to papers with a DOI.

    As far as I understand the methodology, all altmetrics (including Mendeley reader counts) are taken from Altmetric.com. This is a problem because Mendeley data from Altmetric.com are vastly incomplete, see for example the case study with the DOI 10.29024/joa.4. Mendeley data should be retrieved from the Mendeley API.

    The data in Table 1 seem strange: the number of papers found in 2017 is always larger than in 2019. Why? Especially, it is very strange to see that WoS should show about 4000 papers less in 2019 than in 2017. The number of papers found on ResearchGate halved in these two years. Why? It is also surprising to see that the number of papers drops in Altmetric.com by about 5000 papers. Why? What does Stacy Konkiel has to say about this? Such discrepancies in the data set have to be explained better.

    How many papers contribute to the correlation analysis in Table 2?

    Page 2, lines 51-54: "Varied observations have been recorded in different studies, with some studies supporting the existence of correlations whereas few others stated that either correlations are not noticeable or very weak." I think that the studies that stated either correlations are not noticeable or very weak between citation counts and most altmetrics (except for Mendeley and CiteULike reader counts) are in the large majority. Look at the meta-analysis with DOI 10.1007/s11192-015-1565-y and the many studies included there. The results from this meta-analysis should also be discussed in the introduction.

    Page 3, line 28: "... highly cited journals ... ." What is meant with highly cited journals? Maybe, "journals with a high impact factor" is a better wording.

    Page 5, lines 3-5: "The analysis of records from altmetric.com was done mainly for four popular social platforms,
    namely Twitter, Facebook, Mendeley, and News Mentions." Twitter and Facebook definitively count as social platforms. The character of Mendeley as a social platform is debatable because it is also usable via a desktop application without using the online social part. I can't see why news sources should count as "social platforms".

    Page 6, lines 44-47: "The analysis of this data sample shows that highly cited papers are found to be mentioned higher than the average value for the whole dataset." Really? Does it show that? The authors presented correlation results. I do not see how correlation results provide information about averages. It might be the case that the statement is true but the justification is wrong. The authors have the data at hand and can calculate the average values and compare them.

    Page 7, lines 32-35: "It is more evident from these results that papers with the highest social mentions are not among the highest cited papers." The sentence is probably correct when the "not" is replaced with "less often".

    Page 7, lines 48-53: "The data was tagged 14 broad disciplines and analytical results were obtained for each discipline for the five social platforms. The fourteen disciplines selected are as follows: Agriculture (AGR), Art & Humanities (AH), Biology (BIO), Chemistry (CHE), Engineering (ENG), Environment Science (ENV), Geology (GEO), Information Sciences (INF), Material Science (MAR), Mathematics (MAT) ... ." How were the data tagged into 14 broad disciplines? Especially, for Art & Humanities and Mathematics, it is very problematic to apply bibliometrics. The applicability of altmetrics is unknown. This should be mentioned.

    The company Altmetric.com is spelled as altmetric.com which is wrong.

    The manuscript should be proof-read before resubmission. Here are some examples:
    - Page 2, lines 12/13: "It is probably first study ..."
    - Page 2, lines 47/48: "... while some went a step ahead to find out of social media metrics ..."
    - Page 3, lines 13/14: "Do there exits disciplinary variations ..."
    - Page 3, lines 18/19: "Understanding the relationship between altmetrics and citation count ..."
    - Page 4, lines 28/29: "Since DOI filed was the linking data ..."
    - Page 5, line 50: "Another reason of slightly different picture ..."
    - Page 6, lines 51-53: "It, however, be also interesting ..."
    - Page 8, lines 23/24: "It is observed that there are significant variations in correlations values ..."
    - Page 8, lines 49-51: "... by analyzing a large-sized data samples comprising of different disciplines."
    - Page 8, lines 54-56: "Taking data from different time periods make the analysis more credible ..."
    - Page 9, lines 15-17: "... correlation values between citation and altmetrics ..."

    Additional Questions:
    Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: Yes.

    Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: Partly, see comments to the authors.

    Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: Partly, see comments to the authors.

    Results: For empirical papers - are results presented clearly and analysed appropriately?: Yes.

    Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: Yes.

    Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: Yes.

    Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: Partly, hints for improvement are provided in the comments to the authors.

    Reproducible Research: If appropriate, is sufficient information, potentially including data and software, provided to reproduce the results and are the corresponding datasets formally cited?: Yes.

    Decision letter by
    Cite this decision letter
    Reviewer report
    2020/02/07

    The manuscript "Can early altmetric mentions predict later citations? A test of validity on a large dataset" presents correlations between citation counts at a later point in time and mentions on Twitter, Facebook, news, Mendeley, and ResearchGate at an earlier point in time. Altmetrics data are about a year old and citation data are about three years old. This should account for the fact that most altmetrics accumulate faster than citations. However, do all altmetrics accumulate similarly fast? Why is exactly this time delay chosen? More discussion of this aspect is needed as this is mainly the novel part of the manuscript.

    According to the methodology, only papers with an Indian affiliation published in 2016 were analyzed. The WoS query "cu=india and py=2016" results in 132,363 papers, 98,351 of them having a DOI. The authors reported that they found 88,259 papers, 76,621 of them having a DOI. Were additional restrictions imposed, e.g., to the document type? The restriction to papers with an Indian affiliation and publication year 2016 should be mentioned in the title and/or abstract. It should also be discussed as a limitation besides the restriction to papers with a DOI.

    As far as I understand the methodology, all altmetrics (including Mendeley reader counts) are taken from Altmetric.com. This is a problem because Mendeley data from Altmetric.com are vastly incomplete, see for example the case study with the DOI 10.29024/joa.4. Mendeley data should be retrieved from the Mendeley API.

    The data in Table 1 seem strange: the number of papers found in 2017 is always larger than in 2019. Why? Especially, it is very strange to see that WoS should show about 4000 papers less in 2019 than in 2017. The number of papers found on ResearchGate halved in these two years. Why? It is also surprising to see that the number of papers drops in Altmetric.com by about 5000 papers. Why? What does Stacy Konkiel has to say about this? Such discrepancies in the data set have to be explained better.

    How many papers contribute to the correlation analysis in Table 2?

    Page 2, lines 51-54: "Varied observations have been recorded in different studies, with some studies supporting the existence of correlations whereas few others stated that either correlations are not noticeable or very weak." I think that the studies that stated either correlations are not noticeable or very weak between citation counts and most altmetrics (except for Mendeley and CiteULike reader counts) are in the large majority. Look at the meta-analysis with DOI 10.1007/s11192-015-1565-y and the many studies included there. The results from this meta-analysis should also be discussed in the introduction.

    Page 3, line 28: "... highly cited journals ... ." What is meant with highly cited journals? Maybe, "journals with a high impact factor" is a better wording.

    Page 5, lines 3-5: "The analysis of records from altmetric.com was done mainly for four popular social platforms,
    namely Twitter, Facebook, Mendeley, and News Mentions." Twitter and Facebook definitively count as social platforms. The character of Mendeley as a social platform is debatable because it is also usable via a desktop application without using the online social part. I can't see why news sources should count as "social platforms".

    Page 6, lines 44-47: "The analysis of this data sample shows that highly cited papers are found to be mentioned higher than the average value for the whole dataset." Really? Does it show that? The authors presented correlation results. I do not see how correlation results provide information about averages. It might be the case that the statement is true but the justification is wrong. The authors have the data at hand and can calculate the average values and compare them.

    Page 7, lines 32-35: "It is more evident from these results that papers with the highest social mentions are not among the highest cited papers." The sentence is probably correct when the "not" is replaced with "less often".

    Page 7, lines 48-53: "The data was tagged 14 broad disciplines and analytical results were obtained for each discipline for the five social platforms. The fourteen disciplines selected are as follows: Agriculture (AGR), Art & Humanities (AH), Biology (BIO), Chemistry (CHE), Engineering (ENG), Environment Science (ENV), Geology (GEO), Information Sciences (INF), Material Science (MAR), Mathematics (MAT) ... ." How were the data tagged into 14 broad disciplines? Especially, for Art & Humanities and Mathematics, it is very problematic to apply bibliometrics. The applicability of altmetrics is unknown. This should be mentioned.

    The company Altmetric.com is spelled as altmetric.com which is wrong.

    The manuscript should be proof-read before resubmission. Here are some examples:
    - Page 2, lines 12/13: "It is probably first study ..."
    - Page 2, lines 47/48: "... while some went a step ahead to find out of social media metrics ..."
    - Page 3, lines 13/14: "Do there exits disciplinary variations ..."
    - Page 3, lines 18/19: "Understanding the relationship between altmetrics and citation count ..."
    - Page 4, lines 28/29: "Since DOI filed was the linking data ..."
    - Page 5, line 50: "Another reason of slightly different picture ..."
    - Page 6, lines 51-53: "It, however, be also interesting ..."
    - Page 8, lines 23/24: "It is observed that there are significant variations in correlations values ..."
    - Page 8, lines 49-51: "... by analyzing a large-sized data samples comprising of different disciplines."
    - Page 8, lines 54-56: "Taking data from different time periods make the analysis more credible ..."
    - Page 9, lines 15-17: "... correlation values between citation and altmetrics ..."

    Reviewed by
    Cite this review
    Reviewer report
    2020/01/21

    This article assesses whether early altmetric scores correlate with later citation counts for a set of 88,259 articles from India in 2016. The article is well structured, seems to have some novel data (ResearchGate, possibly a data set from India) and attempts to make a genuine research contribution. There are a few problems with the current version that need to be corrected.
    Please clarify in the abstract that the scope of the study is Indian publications from one year in the Web of Science.
    The introduction should focus more narrowly on your research problem (early altmetrics as predictors of later citation counts) and make a claim for the usefulness of the results.
    It is good to see clear research questions but previous publications that you have reviewed have given a positive answer to both of them, so they are redundant. Please narrow your research questions to address specific aspects for which the answer is not known. This might be India and/or ResearchGate.
    The literature review should be more comprehensive and more in-depth. The insights that it gives you should help you formulate better research questions and employ more suitable methods.
    Please give a justification for your research design. Why did you analyse the data in the way that you chose and how does it help to address the research questions?
    Please give a justification for all your methods choices. Why did you do it that way and why is it better than the alternatives?
    Please give the exact method used to obtain the Web of Science sample, including the document types included. This should allow the reader to reproduce what you did and detect, for example, whether your dataset includes editorials and reviews.
    Altmetric.com’s Mendeley data is incomplete because they do not comprehensively index Mendeley (unless this has changed since I last checked) so please delete all references to Mendeley data.
    Please clarify what you did with articles without a matching record in Altmetric.com – did you give them a score of 0 for correlations or remove them?
    The methods state that you were, “able to extract ResearchGate data for 53,832 publication records.”. What data did you extract from ReseachGate? It reports lots of different numbers, so which did you use? Please also give a brief explanation about how the crawler works and how you match DOIs.

    Pearson correlations should never be used for citation counts or altmetrics because they are highly skewed. Please use Spearman correlations instead.
    Please only report statistics for fields and delete all the science-wide statistics (e.g., Tables 2,3,4). These are not helpful due to massive differences in average altmetrics and citation counts between fields.
    The paper Discussion section should relate the results to prior research and show how then fill gaps in it.
    The paper Conclusion section should form conclusions only from the findings of your paper rather than making general statements that are already widely known and could have been written (and have been) before your findings.

    Reviewed by
    Cite this review
All peer review content displayed here is covered by a Creative Commons CC BY 4.0 license.