Dryad was at the 5th International Digital Curation Conference in London last week, getting several prominent mentions by speakers, and with a poster on our research supporting the curation workflow, available here.
A pre-conference workshop on Citability of Research Data provided an introduction to DataCite, a cooperative effort of the German National Library of Science and Technology (TIB), the British Library, Canada Institute for Scientific and Technical Information (CISTI), among others, with the goal: …to establish a not-for-profit agency that enables organisations to register research datasets and assign persistent identifiers to them, so that research datasets can be handled as independent, citable, unique scientific objects. This was followed by a useful discussion of the barriers and challenges, which produced a nice little checklist of things to do. Change scientific culture around data, gain journal/publisher support, facilitate good data management, yes– terminology matters!, resolve data granularity issues, encourage & make it easy for authors to deposit data….
In his keynote, Ed Seidel, Associate Director, Directorate of Mathematical and Physical Sciences, National Science Foundation, said
- publicly funded data should be made available
- simply ‘expecting’ researchers to share data = like expecting teenagers to clean their rooms
- we need “executable publications” that include code and data with paper to run and reproduce science
- and then he called for journals to require data deposition, “If journals require data associated with publication to be available; that would be a major push.”
Timo Hannay, Publishing Director, Nature Publishing Group, began his closing keynote address by saying that “at lunch 3 separate people were kind enough to point out that supplementary information was [no good] in PDF.” Other tidbits from his talk:
- journals need to become more like databases, more structured, more searchable
- we are joining the dots across the intellectual terra incognita
- all information is inter-connected
- the associations between facts are just as important as the facts themselves; we have increasingly interconnected data sets, and are building one global computer and one global database
- this is vast and messy and inconsistent and immensely valuable
- there must be more efficient ways to do peer review but no one has come up with one yet
- Q: do authors send data? what do you do with it?
- A: supplementary info is a catchall phrase
- some of it is data, not most of it
- we just take the file and put it online and link to it
- it’s mostly Excel spreadsheets
- our system used to just put it into a PDF– have fixed that
- there’s slow progress, and is dependent on authors
- interested to see encouraging making usable data available
One interesting paper from Australia, by Dr Andrew Treloar, Australian National Data Service (ANDS), identified data sharing verbs; these are proposed “as a useful way to design and structure flexible services in a heterogeneous environment.”
- store– “ANDS doesn’t do storage but we care that it happens”
- describe– info for discovery, determination of value, access, & re-use
- identify– using handles, just joined DataCite, can now can generate DOI’s, have an “Identify My Data” service; want data to be a first-class output
- register– host a registry of collections
- discover– offer discovery services
- access– 4 ways: direct link, link to data repository, contact info to get data, or metadata only
- exploit, or use– build on what’s available
For more detail see the full paper here. The full IDCC programme is here, and all the recorded sessions are available here. Next year the IDCC will be in Chicago. If you like O’Hare in Dec., this should be a real treat!