Winter 2009 management board meeting

Gate at The British Library

Gate at the British Library
(source: gaspa)

The Dryad Management Board recently held their Winter 2009 meeting at the British Library Conference Center in London. The meeting was attended by 13 journal representatives and 4 members of the Dryad development team. A few highlights from the meeting:

Dryad now includes 489 data files in 163 data packages, though a large proportion of this content has been imported from the Systematic Biology archives.

The rate of submissions to Dryad is slowly increasing. Dryad has been able to accept submissions from authors since early 2009. Two journals, The American Naturalist and Molecular Ecology, have completed initial integration with Dryad, allowing their authors to use a more streamlined submission process. The Journal of Heredity is making progress on integration, and several other journals expect to integrate in the near future.

We are currently improving the user interface for locating and obtaining data. We are developing more sophisticated tools for curation, and we are working with several partner repositories to replicate content and provide federated searching services. For more detail, see the Dryad Development Plan.

The board discussed the role of identifiers in Dryad and whether DOIs should be assigned to Dryad’s holdings. Representatives from CrossRef and DataCite led discussions on the advantages of DOIs. The board unanimously recommended that each Dryad data package be given a DOI (a data package is all data associated with a single article). The executive committee will determine whether DOIs should be used at more granular levels (e.g., the individual files within a data package).

The longest discussion of the meeting focused on plans for transitioning Dryad from the current grant funding to a model that is more sustainable for the long term. Todd Vision presented a cost model created by the Dryad development team and consultant Lorraine Eakin. Consultants from Charles Beagrie Limited presented an analysis of expected staffing needs and potential revenue streams. The board provided guidance on the schedule and methods for pursuing revenue from a variety of sources.

Community engagement emerged as a critical factor in ensuring long-term sustainability. Towards that end, the board discussed many ideas for increasing the visibility of the repository. Notable steps include increasing the frequency of posts on this blog, having a more visible presence at scientific meetings, and expanding use of social networking tools like Facebook and Twitter.

Once the Dryad development team compiles all notes from the meeting, we will release a more detailed report.

From the International Digital Curation Conference

Dryad was at the 5th International Digital Curation Conference in London last week,  getting several prominent mentions by speakers, and with a poster on our research supporting the curation workflow, available here.

A pre-conference workshop on Citability of Research Data provided an introduction to DataCite,  a cooperative effort of the German National Library of Science and Technology (TIB), the British Library, Canada Institute for Scientific and Technical Information (CISTI), among others, with the goal:  …to establish a not-for-profit agency that enables organisations to register research datasets and assign persistent identifiers to them, so that research datasets can be handled as independent, citable, unique scientific objects. This was followed by a useful discussion of the barriers and challenges, which produced a nice little checklist of things to do.  Change scientific culture around data, gain journal/publisher support, facilitate good data management,  yes– terminology matters!, resolve data granularity issues, encourage & make it easy for authors to deposit data….

Here are some more highlights from the meeting. See the IDCC’s videos of the sessions, or the Digital Curation Blog for more.

Dryad board member William Michener presented on DataONE, and made a prominent mention of Dryad in the discussion afterwards.  Thanks, Bill!

In his keynote, Ed Seidel, Associate Director, Directorate of Mathematical and Physical Sciences, National Science Foundation, said

  • publicly funded data should be made available
  • simply ‘expecting’ researchers to share data = like expecting teenagers to clean their rooms
  • we need “executable publications” that include code and data with paper to run and reproduce science
  • and then he called for journals to require data deposition, “If journals require data associated with publication to be available; that would be a major push.”

Timo Hannay, Publishing Director, Nature Publishing Group, began his closing keynote address by saying that “at lunch 3 separate people were kind enough to point out that supplementary information was [no good] in PDF.”  Other tidbits from his talk:

  • journals need to become more like databases, more structured, more searchable
  • we are joining the dots across the intellectual terra incognita
  • all information is inter-connected
  • the associations between facts are just as important as the facts themselves; we have increasingly interconnected data sets, and are building one global computer and one global database
  • this is vast and messy and inconsistent and immensely valuable
  • there must be more efficient ways to do peer review but no one has come up with one yet
  • Q: do authors send data?  what do you do with it?
    • A: supplementary info is a catchall phrase
    • some of it is data, not most of it
    • we just take the file and put it online and link to it
    • it’s mostly Excel spreadsheets
    • our system used to just put it into a PDF– have fixed that
    • there’s slow progress, and is dependent on authors
    • interested to see encouraging making usable data available

One interesting paper from Australia, by Dr Andrew Treloar, Australian National Data Service (ANDS), identified data sharing verbs; these are proposed “as a useful way to design and structure flexible services in a heterogeneous environment.”

  1. create/capture
  2. store– “ANDS doesn’t do storage but we care that it happens”
  3. describe– info for discovery, determination of value, access, & re-use
  4. identify– using handles, just joined DataCite, can now can generate DOI’s,  have an “Identify My Data” service; want data to be a first-class output
  5. register– host a registry of collections
  6. discover– offer discovery services
  7. access– 4 ways: direct link, link to data repository, contact info to get data, or metadata only
  8. exploit, or use– build on what’s available

For more detail see the full paper here.  The full IDCC programme is here, and all the recorded sessions are available here.  Next year the IDCC will be in Chicago.  If you like O’Hare in Dec., this should be a real treat!

American Geophysical Union position statement on data

“Because the state of natural systems is never repeated, data losses, or missed data collection opportunities can never be corrected.”  So says the AGU, recently reaffirming the importance of data availability and

The statement offers strong support for data archiving and publication as a routine part of the research process.

The cost of collecting, processing, validating, and submitting data to a recognized archive should be an integral part of research and operational programs. Such archives should be adequately supported with long-term funding. Organizations and individuals charged with coping with the explosive growth of Earth and space digital data sets should develop and offer tools to permit fast discovery and efficient extraction of online data, manually and automatically, thereby increasing their user base. The scientific community should recognize the professional value of such activities by endorsing the concept of publication of data, to be credited and cited like the products of any other scientific activity, and encouraging peer-review of such publications.

The full statement from the AGU Council can be found here.