Compute Canada (CC) and the Canadian Association of Research Libraries (CARL) and its Portage Network are collaborating to provide a scalable federated platform for digital research data management (RDM) and discovery.
This Federated Research Data Repository (FRDR) is a part of that platform to support transferring, ingesting, curating, preserving, discovering, and sharing Canadian reserach data.
FRDR is not intended to serve as a monolithic solution for all of Canada’s research data needs. Rather, it is meant to provide a framework that allows existing and future data repositories to be federated within a coherent system. At the same time, it provides a flexible repository and preservation system for Canadian researchers and institutions who do not have an existing solution.
RDM practices increase accountability for use of public funds, improve the completeness and understandability of data that is retained, improve the veracity of research findings by permitting other researchers to reproduce the results, improve the discoverability of data by other researchers, and ultimately accelerate new research outcomes.
FRDR is currently in a limited production phase. A small number of research groups have been invited to deposit datasets into FRDR.
The software development project and service launch has been sponsored by Compute Canada and CARL. The Portage Network of CARL assisted with the requirements and design of a national platform service, providing metadata and data workflow solutions and testing the platform. Compute Canada provided project management and software development expertise and necessary computational power. The team has included a core team from the University of Saskatchewan.
The software development project was started in January 2016 and followed an earlier pilot project that identified scalability and preservation as key elements of the service that was needed.
Alpha testing of the user interfaces was started in late 2016 thanks to the contribution of discover interface code developed by the University of British Columbia. Beta testing was started in April 2017 and the project is now in a production mode, but with a small number of research groups.
The Service Manager for FRDR is under the Portage Service Manager, Lee Wilson
The Steering Committee for the project comprises representation from Compute Canada and the Canadian Association of Research Libraries:
The software development project team consists of the core team from the University of Saskatchewan:
Alex Garnett, Research Data Management & Systems Librarian, Simon Fraser University
The copy of submitted data that FRDR has is housed on Compute Canada managed infrastructure at the University of Victoria or at the University of Waterloo.
The metadata related to datasets is housed in a database at Victoria. Most of that metadata is shared with Globus, running on Amazon Web Services services in the USA to be indexed and made available for discovering datasets. Certain metadata fields are not shared with Globus.
The Federated Research Data Repository makes extensive use of tools operated by Globus. Globus is a non-profit project out of the University of Chicago and Argonne National Laboratory. Globus is a partner in the delivery of the FRDR service and provided the following statement:
The Globus service is hosted on infrastructure provided by Amazon Web Services. The system components are encapsulated in Virtual Private Clouds (VPCs) and use security groups, which allow for the provisioning of logically isolated sections of the Amazon cloud. Globus Connect Server is installed on file systems owned and controlled by the institution or researcher, such as campus storage resources or personal computers, creating a Globus “endpoint.” Files managed by Globus are accessible only to authorized users, as defined by the permissions set by the endpoint administrator. The endpoint administrator can further control access by configuring Globus to explicitly deny or restrict access to specific parts of the filesystem. All communication with the Globus service is SSL protected and encrypted, and data transfer is optionally encrypted by the user. Research data never flow through Globus, but are transferred directly between the source and destination systems. Globus Auth provides identity and access management by brokering authentication and authorization between identity providers, resource services, and clients. Users authenticate via Globus Auth using their existing credentials from a trusted identity provider, e.g. their campus username and password. Because Globus Auth acts as an identity broker and uses federated login, institutional credentials are not sent to Globus.