Patient files may contain hints for detecting diseases at an early stage
© Roberto Schirdewahn

Analysis of collated data
Evaluating patient files without violating privacy

Patient files may contain vital hints for detecting diseases at an early stage. However, evaluating them would violate patient privacy. This is where mathematics can help.

Evaluating collated patient data without disclosing any sensitive information about individuals poses a considerable challenge. The team headed by Prof Dr Hans Simon from the Horst Görtz Institute for IT Security at the Ruhr-Universität Bochum has developed a method that facilitates precisely that. The mathematicians distort the data in such a way that individual patients remain anonymous during analysis. Nevertheless, self-learning computer programmes are able to detect correlations in the changed data almost as well as in the original data.

Distorted data

In principle, the distortion works as follows: dice are cast for each patient file; the number on the dice is added to all values in the file. This method alters individual data significantly and unpredictably, but, in the best-case scenario, it does not affect the statistical summaries to a greater extent than the random fluctuation that is present in the data in any case.

For the purpose of their work, the researchers at the Chair of Theoretical Computer Science established a precise definition of what it means in mathematical terms that patients should remain anonymous. And what it means that, distorted or not, the results should not deviate strongly from each other. In order to meet the defined requirements, the mathematicians translated the problem in a geometric representation.

Data represented as vectors

Each patient file was represented as a vector, i.e. an arrow in a geometric space. The evaluation algorithm was only permitted to ask Yes/No questions, such as: Does the patient smoke? Does the patient weigh more than 80 kilograms? Each of these questions was likewise represented as a vector. File vector and question vector forming an obtuse angle symbolised a No response; a sharp angle stood for a Yes response.

Rather than distorting the original data, the researchers carried out that step only after they had converted the data into vectors. Thus, information pertaining to individual patients could be kept anonymous, while at the same time, the researchers were able to make statistical statements about the collated data of all patients.

Detailed article in the science magazine Rubin

A detailed report about the work of Hans Simon has been published in the Ruhr-Universität Bochum’s science magazine Rubin. Texts on the website and images in the download page are free to use for editorial purposes, provided the relevant copyright notice is included.

Press contact

Prof Dr Hans Simon
Chair of Theoretical Computer Science
Horst Görtz Institute for IT Security
Ruhr-Universität Bochum
Germany
Phone: +49 234 32 22797
Email: hans.simon@rub.de

Unpublished

By

Julia Weiler

Share