The lack of data is a major bottleneck for many types of research, and especially for the development of better medical treatments and drugs. This data is extremely sensitive and understandably both people and companies are often unwilling to share their information with others.
Researchers at the Finnish Center for Artificial Intelligence (FCAI) have developed a machine learning-based method that generates synthetic data based on original data sets and enables researchers to share their data with one another. This could solve the persistent problem of data scarcity in medical research and other areas where information is sensitive.
The data generated protect privacy and remain similar enough to the original data to be used for statistical analysis. With the new method, researchers can perform an infinite number of analyzes without endangering the identity of the people involved in the original experiment.
“We optimize the original data so that we can mathematically guarantee that no person can be recognized,” explains Samuel Kaski, professor at Aalto University and director of the FCAI, who co-authored the study.
Researchers have previously produced and used synthetic data, but the new study solves a major problem with existing methods that arises from the fact that synthetic data must be very similar to the original data set in order to be useful for research. In practice it was occasionally possible to identify people despite anonymization.
To address this problem, FCAI researchers are using artificial intelligence, particularly probabilistic modeling. This enables them to use previous knowledge of the original data without coming too close to the properties of the respective data set on which the synthetic data is based.
Leveraging prior knowledge has also made the synthetic datasets more useful for correct statistical discoveries – even in cases where the original dataset is limited in size, which is common in medical research.
The results were published in the journal on June 7th template.
Edited by Gary Cramer