Will Poynter | CLOSER Repository: Modernising Longitudinal Study Management

CLOSER Repository: Modernising Longitudinal Study Management

CLOSER Discovery is a cutting edge search engine for the discovery of metadata on eight of the UK’s cohort and longitudinal studies. The longest running study that is documented in CLOSER Discovery has been running for over 70 years, which creates a formidable problem to document and manage. CLOSER Discovery demonstrates the importance of investing in rich metadata that describes many more aspects of data collection than traditional tools and methods. By documenting detailed information on the question routing, scales and images used and similar questions and variables across multiple studies, researchers, survey and data managers are all better informed. This then opens up new possibilities for the studies going forward.

In order for CLOSER Discovery to function, it sits atop a giant metadata repository, that has been designed not only to power a search engine but provide additional functionality and automation to the studies themselves. By drawing links between multiple studies, centres and data warehouses CLOSER has begun to tear down the outdated data-silo model, which has led to so many issues inhibiting harmonisation and linkage.

Data collection instrument design can be made faster and more consistent through the act of reusing entire sections of questions previously used and designed in previous studies. By documenting this process from the point of design, harmonisation can be performed more efficiently and effectively. By developing these techniques and tools using eight of the UK’s longitudinal studies, they have been rigorously tested for scalability.

CLOSER’s metadata has a universal identifier for every single item, allowing datasets used and papers published to reference the variables they have used precisely. While questionnaire designers are able to reference the questions they have used, such as standard scales, and clearly documented how they have been altered.

By having much richer and clearer metadata documented, data managers can save enormous amounts of time cleaning data that has been collected before being deposited for analysis. Also, the laborious task of creating and then maintain data dictionaries can easily be automated and standardised.

CLOSER Discovery categorises all variables and questions with topics from CLOSER’s controlled vocabulary. This allows more effective searching and filtering of the huge quantity of content made available. The task of apply topics to questions and variables is hugely time consuming, but CLOSER is working with machine learning to further automate the process, enhancing the metadata without increasing costs.

Link to the slides