Content of review 1, reviewed on July 09, 2014
This paper describes, in some detail, the development, implementation and piloting of a data documentation tool for tabular data.
Overall the paper is very clear and provides an easy-to-read narrative of why and how the tool was developed, and how it can be used, as well as tips on populating and handling spreadsheet data, so the data has a longer-life.
I am impressed with the simplicity of this tool, which attempts to solve issues in data description for a single type of data. This is much better than the 'workbench' approaches that try to do too much and end up failing.
The issue of errors in data conversion between formats is critical and is a known issue in the data archival world. This tool addresses some of these common issues that arise in both spreadsheet data description and conversion.
The paper presents some very useful information gathered from surveys and pilot work, but this is rather US-centric. I don't imagine the use cases of these type of scientists' behaviour in other countries are that different, but I would expect some pointers to this wider context.
The referenced literature is good and covers many of the key sources I would refer to. However my own organisation in the UK has been advising on data documentation, including use of Excel and conversion issues, for some years, so it would be good to cite some examples of other efforts to address these issues on the non-ecology field and offer examples of non US resources that provide extensive data management advice (<http://ukdataservice.ac.uk/manage-data.aspx>).
On page 6 the checklist of issues is very clear and useful and great to alert researchers to these issues upfront.
In terms of platforms for the tool, I think a Mac version will be important. In my experience, many data creators prefer to have the convenience of local tools to document data, rather than relying on web-based tools, that can suffer from browser issues and loss of data through poor connection.
I do believe that data preparation tools are best built into researchers' existing data handling software, as this brings the activities a step closer to data analysis and away from the burden of completing data deposit forms.
I love the idea that the source code has been made available and that, on the whole, the project has been carried out in the spirit of openness, despite using a Microsoft base for the tool.
I am also terribly impressed with the work done to convince Microsoft of the importance of this tool, and to secure codevelopment to enable it to be a plug-in. On this front, I have had some negative experience in lobbying software suppliers of qualitative analysis packages to implement a data exchange standard to enable export of within-system documentation; conversion between different market leaders' softwares is currently difficult, if not impossible. They should possibly take a leaf out of Microsoft's book and also listen to what data archivists/publishers are saying!
The tool looks like it has had some user testing and feedback.
Overall I believe this tool could have much wider value than the purposes for which the team have developed it. By simply replacing the metadata standard in use it could easily be applied to other disciplines, e.g. social science data. I would be very keen to pilot it and offer feedback on our own tabular data collection in the social sciences domain. The social sciences use the Data Documentation Initiative (DDI) which has fields that map pretty close to the schema used in the tool and discussed in this paper.
I would advocate engagement with more data centres, possibly through forums like the Research Data Alliance.