The Cluster of Excellence Hearing4all has made data collected from almost 600 study participants publicly accessible in a dataset, the Oldenburg Hearing Health Repository. In this interview, Andrea Hildebrandt, Professor of Psychological Methods and Statistics at the University of Oldenburg, and research data manager Dr Daniel Berg talk about the importance of Open Science – not just for scientists but in the future also for individual patients.
The published data was collected by researchers between 2013 and 2015 as part of a collaboration between Hörzentrum Oldenburg and the Cluster of Excellence Hearing4all. What type of information does it contain?
Andrea Hildebrandt: The dataset contains the results of various audiological tests, ranging from simple hearing tests to complex speech intelligibility tests in background noise. The availability of data from so many different tests offers a unique opportunity – particularly given that in addition to test results, a large amount of subjective information about the study participants is also available. And not only socio-economic data on education levels and income, for example, but also data on hearing health, general health, cognition and self-assessment data on hearing ability. The latter is particularly important when it comes to fitting hearing aids.
So there's a lot of information about individuals, but the data remains anonymous. How does that work?
Daniel Berg: Various statistical methods can be employed to ensure that no one can use the data to work backwards and identify the specific person to whom it relates. With this dataset we achieved what is known as "k-anonymity of 5". This means that no matter how you combine the individual pieces of information in the dataset with each other, it always applies to at least five people. For example, if there is a single 100-year-old person in the study, we would not specify the exact age of the study participants but assign them to age groups such as "over 85" to ensure that there are always at least five people in that group.
Who can use the published data, and how?
Hildebrandt: The data is of interest first and foremost to hearing researchers around the world. A group from Italy working on a new hearing test procedure expressed interest at a cluster symposium recently held in Hanover, for example. Our Italian colleagues need as much data as possible for cross-validation, a method that can be used to test the accuracy of prediction models.
Berg: But the data is also highly relevant for health services research, which focuses on patient care and healthcare, because all kinds of correlations can be identified here: When did an individual with what level of education and what income receive a hearing aid? There are countless potential correlations.
Hildebrandt: The relevance of big data analyses for precision medicine is increasing. To make progress here, freely accessible databases containing large amounts of patient data are highly valuable.
In what way?
Hildebrandt: You need them to train Decision Support Systems. Studies have shown that in many areas the best diagnoses are made when experts, with all their medical experience, work together with Decision Support Systems. These are systems based on artificial intelligence that provide decision support solely on the basis of objective data and thus reduce the human bias of "intuitive statistics". For such systems to be truly helpful in diagnostics, they need to be trained with large amounts of data. Then individual patients can also benefit from this— especially if they have a rarer type of health problem.
Berg: Anyone suffering from a common form of hearing loss is already likely to receive high-quality care from ear, nose and throat specialists and acousticians nowadays. Rare diseases, on the other hand, are more likely to "fall through the cracks", as doctors may be less aware of them, meaning that they are only diagnosed at a later stage. But if there are systems that detect certain patterns in patient data because they have been trained using data from countless other individuals, they can also provide information on possible causes that wouldn't even occur to the doctor at that stage.
Is it still an exception or is it already the rule that raw scientific data is made publicly available?
Hildebrandt: One can say that this is still an exception to the rule. However, more and more researchers are pushing forward with Open Science – if only to avoid having to constantly gather data that may already have been gathered elsewhere. In this way, Open Science makes science more effective and at the same time provides the data material that scientists need in the area of big data, as described above. So you could say that we are setting a good example – naturally, also because we are hoping that there will be many imitators whose data we in turn can use.
Berg: This is why we are also a leading member of a Europe-wide group whose aim is to standardise audiological data formats so that they can be used by others and linked to other databases. In concrete terms, this means that we want to make not just our data, but also the tools to use it, available to other researchers.