Abstract

Purpose This research investigates how the availabilities of both metadata standards and data repositories influence researchers' data reuse intentions either directly or indirectly as mediated by the norms of data reuse and their attitudes toward data reuse. Design/methodology/approach The theory of planned behavior (TPB) was employed to develop the research model of researchers' data reuse intentions, focusing on the roles of metadata standards, data repositories and norms of data reuse. The proposed research model was evaluated using the structural equation modeling (SEM) method based on the survey responses received from 811 STEM (science, technology, engineering and mathematics) researchers in the United States. Findings This research found that the availabilities of both metadata standards and data repositories significantly affect STEM researchers' norm of data reuse, which influences their data reuse intentions as mediated by their attitudes toward data reuse. This research also found that both the availability of data repositories and the norm of data reuse have a direct influence on data reuse intentions and that norm of data reuse significantly increases the effect of attitude toward data reuse on data reuse intention as a moderator. Research limitations/implications The modified model of TPB provides a new perspective in apprehending the roles of resource facilitating conditions such as the availabilities of metadata standards and data repositories in an individual's attitude, norm and their behavioral intention to conduct a certain behavior. Practical implications This study suggests that scientific communities need to develop more supportive metadata standards and data repositories by considering their roles in enhancing the community norm of data reuse, which eventually lead to data reuse behaviors. Originality/value This study sheds light on the mechanism of metadata standard and data repository in researchers' data reuse behaviors through their community norm of data reuse; this can help scientific communities and academic institutions to better support researchers in their data sharing and reuse behaviors. Peer review The peer review history for this article is available at:


Authors

Kim, Youngseek

Publons users who've claimed - I am an author

No Publons users have claimed this paper.

Contributors on Publons
  • 1 reviewer
  • pre-publication peer review (FINAL ROUND)
    Decision Letter
    2021/03/22

    22-Mar-2021

    Dear Kim, Youngseek

    It is a pleasure to accept your manuscript OIR-09-2020-0431.R1, entitled "A study of the roles of metadata standard and data repository in STEM researchers’ data reuse" in its current form for publication in Online Information Review. Please note, no further changes can be made to your manuscript.

    Please go to your Author Centre at https://mc.manuscriptcentral.com/oir (Manuscripts with Decisions for the submitting author or Manuscripts I have co-authored for all listed co-authors) to complete the Copyright Transfer Agreement form (CTA). We cannot publish your paper without this.

    All authors are requested to complete the form and to input their full contact details. If any of the contact information is incorrect you can update it by clicking on your name at the top right of the screen. Please note that this must be done prior to you submitting your CTA.

    If you have an ORCID please check your account details to ensure that your ORCID is validated.

    By publishing in this journal your work will benefit from Emerald EarlyCite. As soon as your CTA is completed your manuscript will pass to Emerald’s Content Management department and be processed for EarlyCite publication. EarlyCite is the author proofed, typeset version of record, fully citable by DOI. The EarlyCite article sits outside of a journal issue and is paginated in isolation. The EarlyCite article will be collated into a journal issue according to the journals’ publication schedule.

    FOR OPEN ACCESS AUTHORS: Please note if you have indicated that you would like to publish your article as Open Access via Emerald’s Gold Open Access route, you are required to complete a Creative Commons Attribution Licence - CCBY 4.0 (in place of the standard copyright assignment form referenced above). You will receive a follow up email within the next 30 days with a link to the CCBY licence and information regarding payment of the Article Processing Charge. If you have indicated that you might be eligible for a prepaid APC voucher, you will also be informed at this point if a voucher is available to you (for more information on APC vouchers please see http://www.emeraldpublishing.com/oapartnerships

    Thank you for your contribution. On behalf of the Editors of Online Information Review, we look forward to your continued contributions to the Journal.

    Sincerely,

    Prof. Kalpana Shankar
    Co-Editor
    kalpana.shankar@ucd.ie


    Tell us how we're doing! We’d love to hear your feedback on the submission and review process to help us to continue to support your needs on the publishing journey.

    Simply click this link https://eu.surveymonkey.com/r/F8GZ2XW to complete a short survey and as a thank you for taking part you have the option to be entered into a prize draw to win £100 in Amazon vouchers. To enter the prize draw you will need to provide your email address.

    Decision letter by
    Cite this decision letter
    Reviewer report
    2021/03/15

    Thank you for the opportunity to review your revised manuscript.

    Reviewed by
    Cite this review
    Reviewer report
    2021/03/11

    Thank you for your responses to the reviewer comments and the changes which you have made to your manuscript. I believe that the clarity has been greatly improved and the implications of the research are now more readily identifiable.

    Reviewed by
    Cite this review
    Author Response
    2021/02/11

    Dear Editor/Reviewers of Online Information Review;

    I thank the editor for inviting me to revise and resubmit the manuscript to the Online Information Review. I also appreciate the specific comments and constructive suggestions provided by the reviewers. These comments have served as the basis for this manuscript’s revision. By following your suggestions and recommendations, I have significantly improved the manuscript. I provided a detailed point-by-point response in the remainder of this document.

    I sincerely hope that my responses and the revisions meet your expectations. I look forward to your feedback on this version of the manuscript, and I can make any additional formatting or substantive changes as needed.

    Thank you again for guiding the improvements of this work. I appreciate the opportunity to revise this manuscript with your help and suggestions.

    Sincerely,

    Author
    Responses to Reviewer 1’s Comments:
    Recommendation: Major Revision
    Comments:
    Thank you for the opportunity to review your paper.

    Additional Questions:
    Comment: Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: The paper describes the results of a survey of STEM researchers to assess their intentions around data reuse. The paper is within the scope of the journal and presents a topic of interest to the journal readership. The methodology is adapted from the Theory of Planned Behavior (TPB), and this adaptation is where the paper makes the most significant contribution to the areas of data reuse and related researcher behaviors. There is a need for better ways to assess researchers' awareness of metadata standards and data repositories specific to their discipline, and I found the overall approach to be promising.
    Response: Thank you for understanding the value of this research! I carefully reviewed your comments and addressed them below.

    Comment: However, to fully capitalize on their adaptation of TPB, the authors are encouraged to draw stronger connections between assessment and practical implications. In spite of the reasonable sample size and the soundness of the methods employed, the current manuscript presents the analysis at a very general level and seems to leave out a lot of useful elaboration.
    Response: Thank you for this comment. I further elaborated the results of this research in the Discussion section, which can lead to the practical implications of this research. Please see the updated Discussion section below:
    “This research examines how the availabilities of both metadata standards and data repositories increase STEM researchers’ norms of data reuse, which positively affect their attitudes toward data reuse and data reuse intentions consecutively. The findings of this research provide valuable implications for the development of institutional resources (i.e., metadata standards and data repositories) for data sharing and reuse, as well as for the role of community norms of data reuse for researchers’ data reuse behaviors. First, this research found that the availability of metadata standards significantly increases STEM researchers’ norms of data reuse (even though it does not have any direct relationship with data reuse intention). Also, the availability of data repositories was found to have significant positive relationships with researchers’ norms of data reuse as well as their data reuse intentions. Similar to Kim and Burns’ (2016) research on the roles of metadata standards and data repositories in data sharing, the results of this research suggest that the STEM researchers who utilize more metadata standards and data repositories for their research activities are more likely to build stronger norms of data reuse; in contrast, if they perceive limited availabilities of metadata standards and data repositories, they are likely to have lower norms of data reuse. This suggests that STEM researchers are more likely to build positive community norms of data reuse when they have more relevant metadata standards and data repositories in their research communities. Currently, the usage rates of metadata standards and data repositories are still limited across diverse academic disciplines (Tenopir et al., 2020); therefore, it is important for academic institutions and research communities to provide their researchers with appropriate data organization methods (i.e., metadata standards) and data storage/services (i.e., data repositories). Furthermore, they need to promote their metadata standards and data repositories for researchers to build more positive community norms of data reuse, which eventually lead to data reuse behaviors.
    Second, this research found that norm of data reuse plays a critical role in STEM researchers’ data reuse decision making by mediating between institutional resources (i.e., metadata standard and data repository) and data reuse intention. This research found that STEM researchers’ norms of data reuse are significantly affected by the availabilities of both metadata standards and data repositories. Then, this research also found that their norms of data reuse significantly increase STEM researchers’ data reuse intentions directly and indirectly as mediated by their attitudes toward data reuse. This finding aligns with what Yoon and Kim’s (2017) found in their study about social scientists’ data reuse behaviors. This research, especially, demonstrated that a large amount of variance in STEM researchers’ attitudes toward data reuse are explained by their norms of data reuse. This means that STEM researchers’ data reuse behaviors can be better promoted indirectly as mediated by norms of and attitudes toward data reuse if they develop strong control beliefs from resource facilitating conditions including metadata standards and data repositories (Ajzen and Fishbein, 2005). The results suggest that researchers who perceive data reuse as a common research practice in their research communities are more likely to have positive attitudes toward data reuse and therefore stronger data reuse intentions. However, those who believe that data reuse is not well-received in their research communities are less likely to have positive attitudes toward data reuse and data reuse intentions. Therefore, the results recommend that the research communities of STEM disciplines need to further develop positive community norms of data reuse by promoting data reuse through education sessions and workshops, providing relevant guidelines for the publications with existing data, and/or emphasizing FAIR principles in scientific communities (Wilkinson et al., 2016) as well as having relevant metadata standards and data repositories in their research communities. Furthermore, funding agencies need to provide funding towards developing a technical infrastructure such as metadata standards and data repositories, and educating the human resources (like librarians) who can help STEM researchers to better organize and manage their research data.
    Lastly, the results of this research indicate a significant and strong relationship between STEM researchers’ attitudes toward data reuse and their data reuse intentions, and this relationship is positively moderated by the norm of data reuse. The results suggest that researchers’ data reuse intentions are mainly driven by their attitudes toward data reuse as well as their norms of data reuse, which confirmed the original TPB model (Ajzen, 1991, Ajzen and Fishbein, 2005). Those attitudinal and normative factors can enhance the explanatory power of STEM researchers’ data reuse intentions. This means that the researchers who have more positive attitudes toward and norms of data reuse are more willing to reuse others’ data. Also, researchers’ norms of data reuse significantly facilitate the relationship between their attitudes toward data reuse and data reuse intentions, which was not examined in prior studies (Yoon and Kim, 2017, Kim and Burns, 2016). This suggests that community norms of data reuse can play a critical role in STEM researchers’ data reuse behaviors, and furthermore, they cannot be achieved without common metadata standards and data repositories, which can build positive community norms of data reuse. Therefore, scientific communities need to enhance their researchers’ norms of data reuse, which directly and indirectly increases their data reuse attitudes and data reuse intentions simultaneously. In addition, this research validated the research model of STEM researchers’ data reuse behaviors with a large sample, and the results can be applied to different scientific disciplines in understanding their researchers’ data reuse behaviors.”

    Comment: Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: The manuscript appears to be somewhat dated as far as the literature review is concerned, with the most recent references having been published in 2016. The field has changed a lot since 2016. The authors are encouraged to conduct a follow up literature review, with particular attention to recent developments in the field including the promulgation of the FAIR principles, as well as the specification by PLOS, Nature, and other publishing groups of preferred data repository characteristics and recommendations. Likewise, publisher and funder policies around data sharing and data publication have become more imperative since 2016, and since 2018 the National Science Foundation has funded several projects specifically focused on improving data reuse.
    Response: I agree with this point! I conducted an extensive literature review to find any recent studies, and I updated the Literature Review section as well as the Introduction section with more recent studies (i.e., 15 recent studies from 2017 to 2020) covering what you mentioned above. Please see the updated Introduction and Literature Review sections below:
    “1. Introduction
    Scientific data sharing and reuse have been enabled and facilitated by many contemporary scientific endeavors including the development of collaboration technologies such as metadata standards and data repositories. The advancement of these collaboration technologies has enhanced the way scientists currently access information, communicate, and collaborate (Kling et al., 2000). A good number of scientific disciplines and communities have designed and developed their domain specific metadata standards to facilitate data sharing and reuse within or outside their own research groups (Bietz et al., 2010, Tenopir et al., 2020). For example, the Ecological Metadata Language (EML) was developed in ecology in order to consolidate various formats of ecological research data (Karasti and Baker, 2008). Also, research institutions and communities have promoted data sharing and reuse through data repositories, where scientists can openly share and reuse raw data (Atkins et al., 2003, Tenopir et al., 2020).
    Although there are metadata standards and data repositories available across many different scientific disciplines (Eschenfelder and Johnson, 2011, Groth et al., 2020), it is still questionable whether metadata standards and data repositories can actually facilitate scientists’ data reuse behaviors. The number of research data repositories registered at re3data.org has dramatically increased from 1,500 repositories in April 2016 to 2,450 repositories in February 2020 (Cousijn and Fenner, 2020, Pampel, 2016); however, a recent study conducted from August 2017 to March 2018 by Tenopir et al. (2020) showed that the usage rates of metadata standards and data repositories slightly increased worldwide from 33.6% to 36.4% and from 27.9% to 31.8% respectively compared to their prior study conducted from October 2013 to March 2014 (Tenopir et al., 2015). A good number of studies indicate that the availability of data repositories affects scientists’ data sharing behaviors; however, few studies did explore the relationships between the availabilities of metadata standards and data repositories and data reuse behaviors (Kim and Burns, 2016, Marcial and Hemminger, 2010, Cragin et al., 2010). Even though the rise of metadata standards and data repositories can help scientists to share and reuse research data, the mechanism of metadata standards and data repositories should be further examined.
    The main objective of this research is to investigate how the availabilities of metadata standards and data repositories influence STEM researchers’ data reuse behaviors. In this research, data reuse is defined as an individual researcher’s behavior of using other researchers’ data for their own research purposes by downloading data from central/local data. Uhlir (2010) argued that the value of data increases when scientists can make more use of their data, and the roles of metadata standards and data repositories were under-studied in the prior research of data sharing and reuse (Yoon and Lee, 2019, Curty et al., 2017, Kim and Burns, 2016, Edwards et al., 2011). Regarding researchers’ data sharing and reuse behaviors, prior studies have emphasized on more direct relationships between the availabilities of metadata standards and data repositories (Yoon and Kim, 2017, Kim and Burns, 2016). Therefore, this research focuses on how both metadata standards and data repositories influence researchers’ data reuse intentions either directly or indirectly as mediated by researchers’ norms of data reuse and their attitude toward data reuse.

    1. Literature Review
      Scientific data sharing and reuse have been promoted by policies by national funding agencies such as National Science Foundation (NSF) in the U.S. and UK Research and Innovation (UKRI) (NSF, 2016, UKRI, 2017), requirements by journal publishers such as Nature and PLOS (Public Library of Science) (Vasilevsky et al., 2017, Federer et al., 2018), and FAIR (Findable, Accessible, Interoperable, Reusable) principles (Wilkinson et al., 2016). Prior studies in data reuse have identified diverse factors affecting scientists’ data reuse practices, and those factors can be categorized into trust in data and context of data collection, individual motivations, and technical resources. First, prior studies identified trust in data and the context of data collection as critical factors influencing data reuse (Jirotka et al., 2005, Yoon, 2017). Since the data is contextualized to where the data is originally collected, the researchers need to trust and understand data within the context to properly reuse it (Yoon and Lee, 2019, Jirotka et al., 2005). Prior studies identified personal, physical, technical, and social contexts of the original data collection as the most critical contextual information for data reusers (Curty et al., 2017, Baker and Yarmey, 2009). In this sense, documentation can provide the contextual and relevant information about the original data, which can support data reuse (Faniel et al., 2013, Pasquetto et al., 2017). In addition, the accessibility of data from those who collect or produce it can help data reusers understand any missing information in the existing data and supplemental documentation about that data (Bishop, 2009).
      In terms of individual motivations, prior studies reported that scientists’ data reuse is influenced by their personal motivations, including the perceived benefits, concerns, and effort involved to engage in the task. Curty (2015) and Kim & Yoon (2017) found that scientists are more likely to reuse other’s data if they think those data are useful and beneficial to their research. However, scientists’ data reuse is hindered by their concerns involving potential ethical problems (Bishop, 2009), misinterpretation of data without knowing its context (Niu and Hedstrom, 2008), and potential risks involved in data reuse (Kim and Yoon, 2017) . Additionally, scholars found that scientists’ data reuse is negatively affected by the issues concerning time and effort required for data reusers in locating existing data (Faniel and Jacobsen, 2010, Zimmerman, 2008) and understanding other researchers’ data for new research purposes (Faniel et al., 2012, Yoon and Kim, 2017).
      With regards to technical resources, prior studies reported that improved infrastructure such as metadata standards, data repositories, and data portals are important factors that influence scientists’ data reuse behaviors (Bishop and Kuula-Luumi, 2017, Abella et al., 2019, Groth et al., 2020). Metadata information about a dataset and specific contextual information about a dataset are necessary items for scientists to understand or interpret others’ original data (Zimmerman, 2008). Metadata standards can enable scientists to conduct collaborative research based on the shared language (Ribes and Lee, 2010);for this reason, Bowker et al (2000) argued that metadata can help scientists to share and reuse research data by providing uniform structures for data. Scholars identified metadata as an important factor in increasing scientists’ data reuse by easing their understanding of other’s data by human and machine (Cragin and Shankar, 2006, Groth et al., 2020). In addition, previous studies also have found that both disciplinary and organizational data repositories as a technical resource facilitate and promote scientists’ data reuse intentions by providing access to data and increasing the trust in data as well (Yakel et al., 2013, Yoon, 2014). Recent studies reported that the increased availability and accessibility of data repositories have enabled scientists to not only share their data but also reuse other’s data as well (Bishop and Kuula-Luumi, 2017, Tenopir et al., 2020).
      This research builds on prior studies by investigating the roles of metadata standards and data repositories in STEM researchers’ data reuse behaviors. Most of previous studies on metadata standards have focused on data reuse in the perspective of research collaboration projects rather than on allowing access to the data of published articles (Hsu et al., 2015, Edwards et al., 2011); however, a few recent studies examined metadata standards from the perspective of data reuse in general (Arslan, 2019, Curty et al., 2017). Also, prior studies focusing on the availabilities of metadata standards and data repositories have focused on the direct relationships with data reuse and sharing (Kim and Nah, 2018, Yoon and Kim, 2017). However, it is necessary to study whether metadata standards and data repositories can facilitate STEM researchers’ data reuse either directly or indirectly as mediated by their perceptions in data reuse. This research therefore examines how both the availabilities of metadata standards and data repositories influence STEM researchers’ data reuse intentions either directly or indirectly as mediated by their norm of data reuse and their attitude toward data reuse.”

    Comment: There is also a particularly strong alignment between works cited by Kim and colleagues and the authors' research, which is another area the authors may benefit from following up on in detail.
    Response: Also, I followed up any important studies (mentioned in the Literature Review section) and discussed them with the findings of this research in the Discussion section in detail. Please see the updated text in the Discussion section below:
    “… Similar to Kim and Burns’ (2016) research on the roles of metadata standards and data repositories in data sharing, the results of this research suggest that the STEM researchers who utilize more metadata standards and data repositories for their research activities are more likely to build stronger norms of data reuse; in contrast, if they perceive limited availabilities of metadata standards and data repositories, they are likely to have lower norms of data reuse. …
    … Then, this research also found that their norms of data reuse significantly increase STEM researchers’ data reuse intentions directly and indirectly as mediated by their attitudes toward data reuse. This finding aligns with what Yoon and Kim’s (2017) found in their study about social scientists’ data reuse behaviors. …
    … Also, researchers’ norms of data reuse significantly facilitate the relationship between their attitudes toward data reuse and data reuse intentions, which was not examined in prior studies (Yoon and Kim, 2017, Kim and Burns, 2016). This suggests that community norms of data reuse can play a critical role in STEM researchers’ data reuse behaviors, and furthermore, they cannot be achieved without common metadata standards and data repositories, which can build positive community norms of data reuse. …”

    Comment: Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: The research is well designed and adaptations to the TPB method are clearly described. The descriptions of the measurement and structural models are clear. As noted above, the adaptation of TPB is the main contribution of the paper. But please see my comments below regarding the hypotheses and the figures.
    Response: Thank you! I have addressed your comments; please see my responses below.

    Comment: Results: For empirical papers - are results presented clearly and analysed appropriately?: The results require some elaboration. Perhaps this is more of a methodological concern, but as the authors note in the introduction and literature review, there is a lot of difference across the various STEM disciplines in terms of the availability and maturity of discipline specific metadata standards and data repositories. However, all of the STEM disciplines are considered together in the analysis. It is understood that the sample size may be too small to allow statistical significance if the results were broken down by discipline, but I believe there were enough respondents from the field of biology to allow a focused analysis of the biologists. As presented, the fact that the majority of respondents were from the field of biology may bias the results because that is a field that is especially well served in terms of metadata standards and repositories.
    Response: Thank you for this suggestion. Biological Sciences are the single largest main academic discipline, and it has a total of 19 sub-disciplines (e.g., biochemistry, biophysics, genetics, neuroscience, zoology, and more) based on the NSF discipline category used in this research – that’s why there are a number of participants in Biological Sciences; however, Computer and Information Sciences has three sub-disciplines, and Mathematical Sciences is consisted of only one sub-discipline. This research randomly sampled about 280 scientists from each of the 56 STEM sub-disciplines in order to reflect different status of metadata standards, data repositories, researchers’ perceptions in data reuse, and their data reuse intentions across diverse STEM sub-disciplines. Therefore, it would be useful to focus on STEM disciplines in general rather than Biological Sciences, and this approach can provide an extensive view of STEM researchers’ data reuse influenced by the availabilities of metadata standards and data repositories mediated by norms of data reuse and attitudes toward data reuse. I elaborated this point in the Target Population and Sampling like below:
    “… This research randomly sampled about 280 scientists from each of the 56 STEM sub-disciplines in order to reflect different status of metadata standards, data repositories, researchers’ perceptions in data reuse, and their data reuse intentions across diverse STEM sub-disciplines, and a total of 15,703 people were actually selected as potential survey respondents for this study.”

    Comment: Additional information is also needed to tie the results of the analysis back to the eight hypotheses. These hypotheses are explained clearly enough, but the authors do not return to them to describe how they are supported or refuted by the results of the analysis.
    Response: Thank you for this suggestion. I provided the summary of hypothesis testing results in a new table. Please see the Table 5 below:
    Hs Statements Result Beta (p)
    H1 Availability of metadata standard positively influences a researcher’s norm of data reuse. Supported .098
    H2 Availability of metadata standard positively influences a researcher’s intention to reuse other researchers’ data. Not Supported -.009
    H3 Availability of data repository positively influences a researcher’s norm of data reuse. Supported .359

    H4 Availability of data repository positively influences a researcher’s intention to reuse other researchers’ data. Supported .100*
    H5 Norm of data reuse positively influences a researcher’s attitude toward data reuse. Supported .134

    H6 Norm of data reuse positively influences a researcher’s data reuse intention. Supported .293
    H7 Norm of data reuse positively influences the relationship between a researcher’s attitude toward data reuse and his/her data reuse intention. Supported .099

    H8 Attitude toward data reuse positively influences a researcher’s intention to reuse other researchers’ data. Supported .497***

    Table 5. Summary of Hypothesis Testing Results (p<0.05, p<0.01, **p<0.001)

    Comment: Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: Yes, in particular the authors discuss work by Kim and colleagues which seems to be well aligned with their research. So there is some engagement with the literature and previous findings.
    Response: Thank you! As I mentioned above, I followed up Kim and colleagues’ studies along with the findings of this research in the Discussion section in detail. Please see the updated Discussion section.

    Comment: However, in addition to the outdated literature reviews, the overall coherence is reduced by the absence of a discussion of how the analysis relates to the several hypotheses.
    Response: Again, I am sorry about this confusion. As I mentioned above, the summary of hypothesis testing results including each hypothesis, its beta, and p-value is provided in Table 5.

    Comment: This reviewer also found the two figures to be confusing and their relationship to research unclear. With more elaboration, the figures could be leveraged to better establish the connection between the current work and previous work.
    Response: Sorry about this confusion. I updated and added descriptive statements for Figure 1 and 2 respectively:
    “Figure 1 below shows the research model of researchers’ data reuse influenced by metadata standard and data repository mediated by norm of data reuse and attitude toward data reuse.”
    “Figure 2 presents the hypothesis testing results based on STEM researchers’ data reuse model, and …”

    Comment: Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: The authors discuss the theoretical and practical implications of their research. The discussion is sound, but this is another area where an updated literature review would benefit the paper. Expectations, services, and capacities relevant to these implications have evolved, and the norms under study have evolved with them.
    Response: Thank you for this suggestion! As I mentioned above, I updated the literature review and provided more relevant discussions in the Discussion section. Please see the updated Literature Review section and the Discussion section.

    Comment: Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: The paper is well written and concise. The language is clear and free of jargon. The only thing to note is the authors' usage of the terms "metadata standard" and "repository." These are often used in the singular form, when the usage requires plural.
    Response: Thank you for pointing out this issue! I carefully review the usage of the terms ‘metadata standard’ and ‘data repository’ throughout the manuscript and updated them appropriately.

    Comment: Reproducible Research: If appropriate, is sufficient information, potentially including data and software, provided to reproduce the results and are the corresponding datasets formally cited?: The authors are encouraged to make their deidentified summary statistics available, as appropriate per their IRB or corresponding protocol. It is understood this may not be possible.
    Response: Both survey data and instrument have been made publicly available via Open ICPSR (Inter-university Consortium for Political and Social Research), and the link will be provided upon publication.

    This journal is participating in Publons Transparent Peer Review. By reviewing for this journal, you agree that your finished report, along with the author’s responses and the Editor’s decision letter, will be linked to from the published article to where they appear on Publons, if the paper is accepted. If you have any concerns about participating in the Transparent Peer Review pilot, please reach out to the journal’s Editorial office. Please indicate below, whether you would like your name to appear with your report on Publons by indicating yes or no.All peer review content displayed here will be covered by a Creative Commons CC BY 4.0 license.: Yes, I would like my name to appear with my report on Publons

    Responses to Reviewer 2’s Comments:

    Recommendation: Minor Revision

    Comments:
    Comment: This is a very interesting study, which provide useful evidence on the potential impact of metadata standards and repositories on research communities. Some additional contextual information on the current data sharing landscape (both in 2015 when the study was conducted, and in 2020), and some clarification on the points made above would strengthen the conclusions overall.
    Response: Thank you for this suggestion! As you suggested, I provided additional contextual information explaining the current status of metadata standards and data repositories in the Introduction section like below:
    “… a recent study conducted from August 2017 to March 2018 by Tenopir et al. (2020) showed that the usage rates of metadata standards and data repositories slightly increased worldwide from 33.6% to 36.4% and from 27.9% to 31.8% respectively compared to their prior study conducted from October 2013 to March 2014 (Tenopir et al., 2015). …”
    Also, I referred to this point again in the Discussion section like below:
    “… Currently, the usage rates of metadata standards and data repositories are still limited across diverse academic disciplines (Tenopir et al., 2020); therefore, it is important for academic institutions and research communities to provide their researchers with appropriate data organization methods (i.e., metadata standards) and data storage/services (i.e., data repositories). …”
    Additional Questions:
    Comment: Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: The paper adds to the body of evidence relating to the connections between data sharing infrastructure and the reuse and sharing of research data, and is original in its application of the Theory of Planned Behavior (TPB) to take into account norms of data sharing in a community as a factor in data sharing behaviours. I believe it fits within the scope of the journal.
    Response: Thank you!

    Comment: Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: It would be useful include some additional literature which relates to the current status of repositories and metadata, in order to provide additional context and to strengthen the findings of the research. This should described, for example, how many repositories are available to researchers currently, of these what proportion use standard metadata schema etc.
    As the survey was conducted in 2015 it would also be beneficial for the author to acknowledge how the availability of metadata standards and repositories may have changed in the past 5 years, and to include some references which relate to this.
    Response: This is an excellent point! As I mentioned above, I found a recent study focusing on the availabilities of metadata standards and data repositories has not been changed in STEM disciplines. I provided the following statement in the Introduction section:
    “… The number of research data repositories registered at re3data.org has dramatically increased from 1,500 repositories in April 2016 to 2,450 repositories in February 2020 (Cousijn and Fenner, 2020, Pampel, 2016); however, a recent study conducted from August 2017 to March 2018 by Tenopir et al. (2020) showed that the usage rates of metadata standards and data repositories slightly increased worldwide from 33.6% to 36.4% and from 27.9% to 31.8% respectively compared to their prior study conducted from October 2013 to March 2014 (Tenopir et al., 2015). …”

    Comment: Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: Attitudes towards data reuse and what drives reuse and sharing is a rich area for exploration. This approach does not merely rely on evidence connected to the availability of repositories and metadata, but factors in relevant data reuse “norms”. The TPB model is well described and seems appropriate for the exploration of the topic.
    Response: Thank you for understanding the value of this research!

    Comment: The collation of over 800 survey responses on the topic is impressive, and I agree with the suggestion that further qualitative research would enrich the findings of this paper.
    Response: Thank you for your agreement. I pointed this out in the Limitations and Future Research like below:
    “… Future research should utilize qualitative methods (i.e. interviews or focus groups), or a mixed-methods approach, which can provide a more extensive picture of researchers’ data reuse behaviors. …”

    Comment: Results: For empirical papers - are results presented clearly and analysed appropriately?: The description of the analysis and results are clear although I cannot comment on the validity of the statistical methods used.
    Response: Thank you!

    Comment: Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: The paper is strongest in its description and analysis of the survey data and the TPB model, but would benefit from some additional description of the context for the study.
    Response: I added more descriptions about the contexts of this study throughout the paper. Please see the revised manuscript.

    Comment: Most importantly, the authors should clarify the relationship between metadata standards and repositories in the context of their research. For example, are there many repositories which do not use a metadata standard? What is the value in including both concepts in the study? Can they be separated? Could a scientist use one and not the other?
    Response: This is a very good point! In this research, metadata standard and data repository are separate concepts. As stated in the manuscript, “metadata standard is defined as a set of data that provides information about one or more aspect of the original research data”, and “data repository is defined as a digital archive where scientists can deposit their data from published articles and download other researchers’ data from their published articles”. For example, the discipline of ecology designed the Ecological Metadata Standard (EML) to organize its field research data, and the discipline of genetics has developed a specific metadata standard to consolidate genome sequencing data. A particular research group can even create its own metadata standard to manage its research data in the definite area of research. Similarly, data repositories have been developed by diverse academic institutions (e.g., institutional repositories at colleges and universities), research fields (e.g., Open ICPSR for social sciences and Dryad for any sciences), and even specific research domains (e.g., GenBank for genetics). Many data repositories do not require researchers to use data repository-specific metadata standards for depositing data in those repositories; similarly, researchers are not always required to deposit the data which share the same metadata standards in the same data repositories. I explained this information further in the manuscript like below:
    “… Both metadata standard and data repository are considered as resource-facilitating conditions according to TPB; however, they are treated as separate concepts in this research. Many data repositories do not require researchers to use data repository-specific metadata standards for depositing data in those repositories, and similarly, researchers are not always required to deposit the data which have the same metadata standards in the same data repositories (Groth et al., 2020). …”

    Comment: Some additional clarification is also needed on the existing research into the relationship between repositories and metadata, and data reuse. The paper states both that “some studies did not find any significant relationships between the availability of metadata standards and data repositories and data sharing and reuse behaviours” (and these studies should be added as citations) and also that “A good number of studies indicate that the availability of data repositories affects scientists’ data sharing and reuse behaviours.” These are somewhat contradictory and it is not clear what the authors’ position is, or whether this is what they intend to explore.
    Response: Thank you for pointing out this important issue! In order to avoid any confusion, I restated and provided relevant citations and references for the following sentence:
    “… A good number of studies indicate that the availability of data repositories affects scientists’ data sharing behaviors; however, few studies did explore the relationships between the availability of metadata standards and data repositories and data reuse behaviors (Kim and Burns, 2016, Marcial and Hemminger, 2010, Cragin et al., 2010). …”

    Comment: To aid the reader I would suggest the addition of both a short summary of the relationship between repositories and metadata; and clarification of the authors’ perspectives on how and why these may influence researchers’ behaviours.
    Response: Thank you again for this suggestion! I provided more detailed explanations how and why both metadata standard and data repository may influence STEM researchers’ data reuse behaviors based on TPB and prior studies. Please see the updated text below:
    “Metadata Standard
    This research assumes that availability of metadata standard can positively influence both researchers’ norm of data reuse and their data reuse intentions. Metadata standard is defined as a set of data that provides information about one or more aspect of the original research data (Zimmerman, 2007). According to TPB, behavioral control factors including resource facilitating conditions and self-efficacy can help people develop a certain behavioral intention and therefore perform an actual behavior (Ajzen, 2002). Both metadata standard and data repository are considered as resource-facilitating conditions according to TPB; however, they are treated as separate concepts in this research. Many data repositories do not require researchers to use data repository-specific metadata standards for depositing data in those repositories, and similarly, researchers are not always required to deposit the data which have the same metadata standards in the same data repositories (Groth et al., 2020). Prior studies in scientific data sharing and reuse reported that the availability of metadata standards can help scientists’ data sharing and reuse behaviors (Bowker and Star, 2000, Zimmerman, 2007). Kim and Burns (2016) also found that the availability of metadata standard significantly increases biological scientists’ norm of data sharing. Thus, the availability of metadata standard can increase researchers’ norm of data reuse and their intentions to reuse other researchers’ data.”
    “Data Repository
    This research also assumes that availability of data repositories can positively influence both researchers’ norms of data reuse and their data reuse intentions. Along with metadata standards, this research also examines the availability of data repository as a behavioral control factor (especially as an external behavioral control factor) (Ajzen, 2002). Data repository is defined as a digital archive where scientists can deposit their data from published articles and download other researchers’ data from their published articles (Kim and Burns, 2016). Data repositories are designed to allow research communities to store, share, query, and download data (Fennema-Notestine, 2009). There are a number of data repositories available across many different scientific disciplines, including: biology (e.g., Dryad and Entrez Databases), genetics (e.g., Gene Expression Omnibus and GenBank), medicine (e.g., National Center for Biotechnology Information), geosciences (e.g., Commons of Geographic Data), and astronomy (e.g., National Space Science Data Center). The conceptual model in the present study considers the availability of data repositories as an important underlying resource supporting researchers’ data reuse intentions as well as norm of data reuse (Kim and Burns, 2016). Prior studies in data sharing and reuse evaluated the effect the availability of data repositories has in data sharing and reuse; those studies found a significant relationship between the availability of data repositories and scientists’ data sharing behaviors (Kim and Stanton, 2016, Faniel et al., 2016). Thus, the availability of data repositories can increase researchers’ norm of data reuse and their intentions to reuse other researchers’ data.”

    Comment: Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: While the study suggests that scientific communities should develop additional metadata standards and data repositories in consideration of their positive impact on data reuse, it does not provide sufficient context on the current metadata/repository landscape. Are there gaps where additional repositories or metadata standards are required? An explanation here would strengthen the conclusions.
    For example the authors state that “it is important for academic institutions and research communities to provide their researchers with appropriate data organization methods and data storage/services to build more positive subjective norms of data reuse” but more context is needed on whether they are already doing so, or not.
    Response: Thank you for this comment! I provided more contextual information in the Discussion and Conclusion section like below:
    “… This suggests that STEM researchers are more likely to build positive community norms of data reuse when they have more relevant metadata standards and data repositories in their research communities. Currently, the usage rates of metadata standards and data repositories are still limited across diverse academic disciplines (Tenopir et al., 2020); therefore, it is important for academic institutions and research communities to provide their researchers with appropriate data organization methods (i.e., metadata standards) and data storage/services (i.e., data repositories). Furthermore, they need to promote their metadata standards and data repositories for researchers to build more positive community norms of data reuse, which eventually lead to data reuse behaviors.”

    Comment: The findings in relation to training and development of data sharing norms are much stronger and more well-evidenced in my opinion.
    Response: Thank you for this kind comment!

    Comment: Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: Although the central focus of the study, its methods and results are clearly described, I think that some additional clarification is needed in the contextual framing of the research. For example it is unclear why the definition of data reuse given initially is “an individual researcher’s behavior of using other researchers’ data for their own research purposes by either downloading data from central/local data repositories or by requesting data via personal communication methods” while the study itself focuses only on data repositories.
    Response: Sorry about this confusion. This research focuses on STEM researchers’ data reuse especially by downloading data from data repositories. In order to avoid any potential confusion, I updated the initial definition of data reuse like below:
    “… In this research, data reuse is defined as an individual researcher’s behavior of using other researchers’ data for their own research purposes by downloading data from central/local data. …”

    Comment: As noted previously there are also contradictory statements regarding prior research on the impact of repositories and metadata on data reuse, and whether these have been positive or not. A summary statement at the end of each main section may help to increase the clarity overall.
    Response: Thank you for this suggestion! I restated the contradictory statements above (please see my comments above), and I provided a summary statement at the end of each main section if necessary. Please see the updated manuscript.

    Reproducible Research: If appropriate, is sufficient information, potentially including data and software, provided to reproduce the results and are the corresponding datasets formally cited?:

    This journal is participating in Publons Transparent Peer Review. By reviewing for this journal, you agree that your finished report, along with the author’s responses and the Editor’s decision letter, will be linked to from the published article to where they appear on Publons, if the paper is accepted. If you have any concerns about participating in the Transparent Peer Review pilot, please reach out to the journal’s Editorial office. Please indicate below, whether you would like your name to appear with your report on Publons by indicating yes or no.All peer review content displayed here will be covered by a Creative Commons CC BY 4.0 license.: No, I would not like my name to appear with my report on Publons

    Author response by


    Cite this author response
  • pre-publication peer review (ROUND 1)
    Decision Letter
    2020/12/19

    &PHPSESSID19-Dec-2020;

    Dear Dr. Kim,

    Manuscript ID OIR-09-2020-0431 entitled "A study of the roles of metadata standard and data repository in STEM researchers’ data reuse" which you submitted to Online Information Review has been reviewed. The comments of the reviewer(s) are included at the bottom of this letter.

    The reviewers have recommended that you make major revisions to your manuscript prior to it being considered for publication. Please also look out for repetitive language and edit carefully, as the reviewers noted a number of problems with language choice/usage.

    Please read their suggestions and if you choose to prepare a revised manuscript ensure that any changes that you make to your manuscript are highlighted, as well as described in your response to reviewers.

    Please also ensure that in doing so your paper does not exceed the maximum word length of 10000 words and that it meets all the requirements of the author guidelines at http://www.emeraldinsight.com/products/journals/author_guidelines.htm?id=oir=ubl727mru90lg3hc8sa5p5qrt2."

    To revise your manuscript log into https://mc.manuscriptcentral.com/oir and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions". Under "Actions" click on "Create a Revision". Your manuscript number has been appended to denote a revision.

    You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript using a word processing program and save it on your computer. Please also highlight the changes to your manuscript within the document by using the track changes mode in MS Word or by using bold or coloured text.

    Once the revised manuscript is prepared you can upload it and submit it through your Author Centre.

    When submitting your revised manuscript, you will be able to respond to the comments made by the reviewer(s) in the space provided. You can use this space to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the reviewer(s).

    IMPORTANT: Your original files are available to you when you upload your revised manuscript. Please delete any redundant files before completing the submission.

    Because we are trying to facilitate timely publication of manuscripts submitted to Online Information Review, your revised manuscript should be uploaded as soon as possible. If it is not possible for you to submit your revision in a reasonable amount of time, we may have to consider your paper as a new submission.

    To help support you on your publishing journey we have partnered with Editage, a leading global science communication platform, to offer expert editorial support including language editing and translation.
    If your article has been rejected or revisions have been requested, you may benefit from Editage’s services. For a full list of services, visit: authorservices.emeraldpublishing.com/
    Please note that there is no obligation to use Editage and using this service does not guarantee publication.

    Once again, thank you for submitting your manuscript to Online Information Review. I look forward to receiving your revision.

    Yours sincerely,

    Prof. Kalpana Shankar
    kalpana.shankar@ucd.ie

    Reviewer(s)' Comments to Author:
    Reviewer: 1

    Recommendation: Major Revision

    Comments:
    Thank you for the opportunity to review your paper.

    Additional Questions:
    Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: The paper describes the results of a survey of STEM researchers to assess their intentions around data reuse. The paper is within the scope of the journal and presents a topic of interest to the journal readership. The methodology is adapted from the Theory of Planned Behavior (TPB), and this adaptation is where the paper makes the most significant contribution to the areas of data reuse and related researcher behaviors. There is a need for better ways to assess researchers' awareness of metadata standards and data repositories specific to their discipline, and I found the overall approach to be promising. However, to fully capitalize on their adaptation of TPB, the authors are encouraged to draw stronger connections between assessment and practical implications. In spite of the reasonable sample size and the soundness of the methods employed, the current manuscript presents the analysis at a very general level and seems to leave out a lot of useful elaboration.

    Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: The manuscript appears to be somewhat dated as far as the literature review is concerned, with the most recent references having been published in 2016. The field has changed a lot since 2016. The authors are encouraged to conduct a follow up literature review, with particular attention to recent developments in the field including the promulgation of the FAIR principles, as well as the specification by PLOS, Nature, and other publishing groups of preferred data repository characteristics and recommendations. Likewise, publisher and funder policies around data sharing and data publication have become more imperative since 2016, and since 2018 the National Science Foundation has funded several projects specifically focused on improving data reuse. There is also a particularly strong alignment between works cited by Kim and colleagues and the authors' research, which is another area the authors may benefit from following up on in detail.

    Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: The research is well designed and adaptations to the TPB method are clearly described. The descriptions of the measurement and structural models are clear. As noted above, the adaptation of TPB is the main contribution of the paper. But please see my comments below regarding the hypotheses and the figures.

    Results: For empirical papers - are results presented clearly and analysed appropriately?: The results require some elaboration. Perhaps this is more of a methodological concern, but as the authors note in the introduction and literature review, there is a lot of difference across the various STEM disciplines in terms of the availability and maturity of discipline specific metadata standards and data repositories. However, all of the STEM disciplines are considered together in the analysis. It is understood that the sample size may be too small to allow statistical significance if the results were broken down by discipline, but I believe there were enough respondents from the field of biology to allow a focused analysis of the biologists. As presented, the fact that the majority of respondents were from the field of biology may bias the results because that is a field that is especially well served in terms of metadata standards and repositories.

    Additional information is also needed to tie the results of the analysis back to the eight hypotheses. These hypotheses are explained clearly enough, but the authors do not return to them to describe how they are supported or refuted by the results of the analysis.

    Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: Yes, in particular the authors discuss work by Kim and colleagues which seems to be well aligned with their research. So there is some engagement with the literature and previous findings. However, in addition to the outdated literature reviews, the overall coherence is reduced by the absence of a discussion of how the analysis relates to the several hypotheses. This reviewer also found the two figures to be confusing and their relationship to research unclear. With more elaboration, the figures could be leveraged to better establish the connection between the current work and previous work.

    Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: The authors discuss the theoretical and practical implications of their research. The discussion is sound, but this is another area where an updated literature review would benefit the paper. Expectations, services, and capacities relevant to these implications have evolved, and the norms under study have evolved with them.

    Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: The paper is well written and concise. The language is clear and free of jargon. The only thing to note is the authors' usage of the terms "metadata standard" and "repository." These are often used in the singular form, when the usage requires plural.

    Reproducible Research: If appropriate, is sufficient information, potentially including data and software, provided to reproduce the results and are the corresponding datasets formally cited?: The authors are encouraged to make their deidentified summary statistics available, as appropriate per their IRB or corresponding protocol. It is understood this may not be possible.

    This journal is participating in Publons Transparent Peer Review. By reviewing for this journal, you agree that your finished report, along with the author’s responses and the Editor’s decision letter, will be linked to from the published article to where they appear on Publons, if the paper is accepted. If you have any concerns about participating in the Transparent Peer Review pilot, please reach out to the journal’s Editorial office. Please indicate below, whether you would like your name to appear with your report on Publons by indicating yes or no.All peer review content displayed here will be covered by a Creative Commons CC BY 4.0 license.: Yes, I would like my name to appear with my report on Publons

    Reviewer: 2

    Recommendation: Minor Revision

    Comments:
    This is a very interesting study, which provide useful evidence on the potential impact of metadata standards and repositories on research communities. Some additional contextual information on the current data sharing landscape (both in 2015 when the study was conducted, and in 2020), and some clarification on the points made above would strengthen the conclusions overall.

    Additional Questions:
    Originality: Does the paper make a significant theoretical, empirical and/or methodological contribution to an area of importance, within the scope of the journal?: The paper adds to the body of evidence relating to the connections between data sharing infrastructure and the reuse and sharing of research data, and is original in its application of the Theory of Planned Behavior (TPB) to take into account norms of data sharing in a community as a factor in data sharing behaviours. I believe it fits within the scope of the journal.

    Relationship to Literature: Does the paper demonstrate an adequate understanding of the relevant literature in the field and cite an appropriate range of literature sources? Is any significant work ignored? Is the literature review up-to-date? Has relevant material published in Online Information Review been cited?: It would be useful include some additional literature which relates to the current status of repositories and metadata, in order to provide additional context and to strengthen the findings of the research. This should described, for example, how many repositories are available to researchers currently, of these what proportion use standard metadata schema etc.

    As the survey was conducted in 2015 it would also be beneficial for the author to acknowledge how the availability of metadata standards and repositories may have changed in the past 5 years, and to include some references which relate to this.

    Methodology: Is the paper's argument built on an appropriate base of theory, concepts or other ideas? Has the research on which the paper is based been well designed? Are the methods employed appropriate and fully explained? Have issues of research ethics been adequately identified and addressed?: Attitudes towards data reuse and what drives reuse and sharing is a rich area for exploration. This approach does not merely rely on evidence connected to the availability of repositories and metadata, but factors in relevant data reuse “norms”. The TPB model is well described and seems appropriate for the exploration of the topic.

    The collation of over 800 survey responses on the topic is impressive, and I agree with the suggestion that further qualitative research would enrich the findings of this paper.

    Results: For empirical papers - are results presented clearly and analysed appropriately?: The description of the analysis and results are clear although I cannot comment on the validity of the statistical methods used.

    Discussion/Argument: Is the relation between any empirical findings and previous work discussed? Does the paper present a robust and coherent argument? To what extent does the paper engage critically with the literature and findings? Are theoretical concepts articulated well and used appropriately? Do the conclusions adequately tie together the other elements of the paper?: The paper is strongest in its description and analysis of the survey data and the TPB model, but would benefit from some additional description of the context for the study.

    Most importantly, the authors should clarify the relationship between metadata standards and repositories in the context of their research. For example, are there many repositories which do not use a metadata standard? What is the value in including both concepts in the study? Can they be separated? Could a scientist use one and not the other?

    Some additional clarification is also needed on the existing research into the relationship between repositories and metadata, and data reuse. The paper states both that “some studies did not find any significant relationships between the availability of metadata standards and data repositories and data sharing and reuse behaviours” (and these studies should be added as citations) and also that “A good number of studies indicate that the availability of data repositories affects scientists’ data sharing and reuse behaviours.” These are somewhat contradictory and it is not clear what the authors’ position is, or whether this is what they intend to explore.

    To aid the reader I would suggest the addition of both a short summary of the relationship between repositories and metadata; and clarification of the authors’ perspectives on how and why these may influence researchers’ behaviours.

    Implications for research, practice and/or society: Does the paper identify clearly any implications for research, practice and/or society? Does the paper bridge the gap between theory and practice? How can the research be used in practice (economic and commercial impact), in teaching, to influence public policy, in research (contributing to the body of knowledge)? What is the impact upon society (influencing public attitudes, affecting quality of life)? Are these implications consistent with the findings and conclusions of the paper?: While the study suggests that scientific communities should develop additional metadata standards and data repositories in consideration of their positive impact on data reuse, it does not provide sufficient context on the current metadata/repository landscape. Are there gaps where additional repositories or metadata standards are required? An explanation here would strengthen the conclusions.

    For example the authors state that “it is important for academic institutions and research communities to provide their researchers with appropriate data organization methods and data storage/services to build more positive subjective norms of data reuse” but more context is needed on whether they are already doing so, or not.

    The findings in relation to training and development of data sharing norms are much stronger and more well-evidenced in my opinion.

    Quality of Communication: Does the paper clearly express its case, measured against the technical language of the fields and the expected knowledge of the journal's readership? Has attention been paid to the clarity of expression and readability, such as sentence structure, jargon use, acronyms, etc.: Although the central focus of the study, its methods and results are clearly described, I think that some additional clarification is needed in the contextual framing of the research. For example it is unclear why the definition of data reuse given initially is “an individual researcher’s behavior of using other researchers’ data for their own research purposes by either downloading data from central/local data repositories or by requesting data via personal communication methods” while the study itself focuses only on data repositories.

    As noted previously there are also contradictory statements regarding prior research on the impact of repositories and metadata on data reuse, and whether these have been positive or not. A summary statement at the end of each main section may help to increase the clarity overall.

    Reproducible Research: If appropriate, is sufficient information, potentially including data and software, provided to reproduce the results and are the corresponding datasets formally cited?:

    This journal is participating in Publons Transparent Peer Review. By reviewing for this journal, you agree that your finished report, along with the author’s responses and the Editor’s decision letter, will be linked to from the published article to where they appear on Publons, if the paper is accepted. If you have any concerns about participating in the Transparent Peer Review pilot, please reach out to the journal’s Editorial office. Please indicate below, whether you would like your name to appear with your report on Publons by indicating yes or no.All peer review content displayed here will be covered by a Creative Commons CC BY 4.0 license.: No, I would not like my name to appear with my report on Publons

    Decision letter by
    Cite this decision letter
    Reviewer report
    2020/12/08

    This is a very interesting study, which provide useful evidence on the potential impact of metadata standards and repositories on research communities. Some additional contextual information on the current data sharing landscape (both in 2015 when the study was conducted, and in 2020), and some clarification on the points made above would strengthen the conclusions overall.

    Reviewed by
    Cite this review
    Reviewer report
    2020/12/03

    Thank you for the opportunity to review your paper.

    Reviewed by
    Cite this review
All peer review content displayed here is covered by a Creative Commons CC BY 4.0 license.