From eCrystals Federation Project
Whilst institutional repository networks which provide managed storage and open access to the textual interpretations of research, are emerging e.g. SHERPA and DRIVER, the data repository landscape within institutions is considerably less mature. Well-established community archives such as the UK Data Archive and the EBI sequence databanks for bio-informatics data, provide curated resources in certain disciplines. However, the technical infrastructure and associated support for research data remains fragmented and there are gaps in provision as exemplified by Open DOAR where out of the 76 recorded UK institutional repositories a mere 4 contain datasets. This is against a backdrop of an increasing “deluge” of data generated by both large-scale facilities and institution-based small-science. In addition, the highly social, participative, (and chaotic) constructs of the current Web environment are changing scholarly communications, and we are starting to see scientific data as well as textual information, being shared, discussed and evaluated in blogs and wikis e.g. within the associated R4L Project. This is in contrast to the more formal standards-driven service-oriented architectural approach of the eFramework.
The pioneering JISC funded eBank-UK project (three phases since Sept 2003), has constructed an institutional repository that makes available the raw, derived and results data from a crystallographic experiment, developed the the eBank aggregator service for metadata harvesting by 3rd parties and promoted the linking from primary data to other research outputs within the scholarly knowledge cycle (Lyon, Ariadne July 2003). Phase 3 also investigated preservation and curation aspects of the data repository and evaluated approaches to audit and certification. Phase 3 was positioned as a transitional scoping study for the proposed eCrystals Federation, and this bid describes the first stages of full implementation. The JISC funded eBank-UK, Repository for the Laboratory and SPECTR-a projects held a joint consultation workshop entitled "Digital Repositories supporting eResearch: exploring the eCrystals Federation Model". The transcript report of this meeting is available here. The results from Phase 3 are currently being assimilated and collated into a series of reports, however there are a number of outcomes which are already evident:
- Crystallographic laboratory practices are very varied, ranging from a more automated workflow with outputs handled and manipulated digitally, to a very “hands-on” process where an individual crystallographer oversees the process and maintains paper copies of results in a filing cabinet.
- This variation in laboratory practice has implications for the ease of adoption of a standard metadata schema such as the eBank Application Profile.
- Crystal structure data and associated information is complex, should be considered as compound objects and will require the use of a metadata packaging format such as METS or MPEG DIDL.
- There are likely to be a range of persistent identifiers in use within any discipline. The allocation of identifiers by the issuing agency must be efficient, reliable and scaleable.
- When considering preservation and curation, these aspects need to be addressed: audit and certification processes and procedures, representation information for crystallography data, preservation metadata for crystallography data, conformance to the OAIS Reference Model of repository software in use within the Federation.
- It is clear that preservation and curation issues will have to be addressed politically by both institutions and the community.
- Advocacy programmes will be essential to assist with populating the data repositories, since there is no established culture of sharing data within the chemistry domain.
- The implementation of a data embargo procedure/policy will be an important factor in encouraging searchers to deposit data destined for eventual open access.
- The pro-active support of professional societies, publishers, data centres and other key domain stakeholders is essential to achieve buy-in from the scholarly community.
- It is unclear as to the exact nature of the relationship between subject-based and institutional repositories and mechanisms for machine to machine interoperability will be necessary.