Diabetes Data Processing

In the NGI funded MyPCH project OwnYourData developed several technologies for secure and traceable data exchange of diabetes data: Digital Watermarking, Semantic Annotation, and Data Traceability. In addition, we also participated in a MyData Health initiative to write a feature article for the European Medical Writers Association (EMWA) and we are proud to announce that the article written by 14 individuals from 9 countries around the 6 MyData principles is already published and public available via the EMWA journal website – see section “Data Interoperability” and “Establishing trust between stakeholders for health data use” for example use cases of Semantic Containers.

In this blog post we cover the successful integration of diabetes data into the OwnYourData Data Vault. In this data flow, persons with diabetes (Pwd) can not only transfer their data to a Personal Data Store but also perform SPARQL queries to combine their diabetes data with public information – more information and examples are available here.

A special feature in the OwnYourData Data Vault is the Personal Knowledge Graph shown on the left side of the main screen. It compiles available data from the respective user and presents the information in a clear form. The screenshot below shows information about recent GPS Data, blood sugar levels over the course of a few days, as well as record numbers overall.

Beyond information in the Personal Knowledge Graph plugins allow further data exploration. In the course of the MyPCH project however, we decided to use existing tools like R or Jupyter Notebooks to provide more sophisticated visualization and analysis mechanisms. The R-Notebook available on Github is an example how to retrieve and decrypt information in the Data Vault and compile a report.

If you have any questions using Diabetes data with Semantic Containers or within the OwnYourData Data Vault don’t hesitate to contact us as support@ownyourdata.eu.

NGI Funding for DECTS

The project DECTS (Deaf Emergency Chat and Training System) aims to provide deaf emergency calls and a training environment in several languages. With the help of a chatbot, deaf people can learn how to use the app and at the same time generate test data for training the control center personnel. The users can determine whether the entries are used as test data and there is documentation about origin, GDPR-compliant provision, and use of the data. The consent to the use of the data can also be changed and revoked later.

The teams from OwnYourData and DEC112 work together to implement this. In previous projects, an infrastructure was already set up in Austria for deaf emergency calls and the challenge now lies in international operations – for example when an Austrian tourist is on vacation in Copenhagen: in this case an emergency call is made via the DEC112 App registered in Austria and conveyed to the control center in Copenhagen.

A chatbot is developed for the training environment that simulates a control center. In a test chat, structured information is queried and further questions arise when using certain keywords. In cooperation with emergency call centers, typical conversations were analyzed and so-called decision trees were created, which the chatbot automatically processes.

If a user consents to the further use of the chat protocol, this consent can be managed in the OwnYourData Data Vault. There the consent of the transfer of the data is documented and it is possible to query when the data was accessed. In particular, however, access can also be restricted or subsequently prohibited. Semantic containers are used as the technology platform, which ensure data access is transparent and traceable.

Finally, personal data (emergency contacts, medical data and other information) can also be stored in the OwnYourData Data Vault, which is automatically provided to the control center in the event of an emergency chat. This personal data is referenced using a DID (Decentralized ID) and the data itself is stored encrypted. The Shamir’s Secret Sharing scheme ensures that the data can only be read by the user and the control center, but cannot be accessed by OwnYourData or DEC112.

The system architecture for the project is shown in the graphic below, together with the data flows between the individual components. All parts are now at least available as prototypes and the first end-to-end tests were carried out in May.

Digital Watermarking

A digital watermark is a kind of marker covertly embedded in data and is also sometimes referred to as “the practice of imperceptibly altering a work to embed a message about that work”. For Semantic Container a digital watermark is a unique digital fingerprint that is applied to data provided by a Semantic Container, i.e., any data request results in a dataset with insignificant errors that uniquely identifies the recipient of the data set. In case such a dataset is leaked and appears in an unintended location, the person who originally requested and leaked the dataset can be identified. This blog post describes the design of the digital watermarking that will be implemented in the course of the currently ongoing MyPCH project.

Watermark embedding

To embed a watermark into a dataset the following two steps are performed:

  1. Pre-processing: the available data is split up into fragments of a defined size, e.g., all measurements from a single day
  2. Encoding: based on a secret parameter (or key) unique to the requesting party a sequence of errors with the same size as a data fragment is created and then applied to the original data, i.e., for numerical values this is just adding value and error

Watermark attacks

There are a number of possible attacks against digital watermarking:
  • Distortion Attack: There are different kind of distortions which may be applied to a dataset, e.g., rounding to the n-th digit. Rounding the values on the least significant digit preserves the data’s usability the most but may be detected more easily than rounding digits further up.
  • Deletion Attack: As with distortions attacks, different kind of deletions may be applied to a dataset to make the identification of the original recipient harder.
  • Collusion Attack: A collusion attack is performed by combination of n copies of the same dataset. For each measurement the mean of all n copies is calculated to create a new dataset.

Watermark Detection

To detect a watermark in a suspicious dataset the following two steps are performed and require the original data to be available:

  1. Detection: Through similarity search the suspicious dataset (already fragmented) is matched against original data fragments and in case of a match the difference between suspicious dataset and original dataset is the (possibly noisy) unique error
  2. Mapping: The extracted error is compared through similarity search with the original error based on the secret parameter (or key). In case of a match the original recipient of the data is identified.

The above process including various test cases for attacks will be implemented in the next weeks and will soon be available in the Semantic Container base package. Feel free to reach out to us with any questions or comments!

SEMCON Milestone 2/3 – Spring

The project is progressing rapidly and after 6 months, there is now a lot to report in the spring. With Peb Ruswono Aryan, a new team member of the Vienna University of Technology joined us and his areas of specialization are geoinformation systems and Python.

But we also want to share our completed milestones, give an outlook on a few events events, and summarize other activities:

  • The current state of the software is available at Github (https://github.com/sem-con) and Dockerhub (https://hub.docker.com/r/semcon).
  • The documentation has been updated and you find the current version of White Paper and System Design on our homepage in the Resources menu along with other relevant information.
  • Our business plan is developing well, further reviews with T-Systems, ZAMG and Inits are already being planned.
  • There is still a lively exchange with our project partners ZAMG and EODC. Altogether there were 21 meetings in the last 3 months!
  • To ensure a good progress even after the end of the bmvit / FFG call “ICT of the future: Sondierungen für den Datenmarkt 2018”, we submitted on March 28th as a technology supplier and part of a consortium a project propsoal to the ICT-13-2018-2019 H2020 Call.

Dates, Dates Dates … We are glad, if you come over!

  • On May 3rd, Christoph and Peb will give a lecture on Semantic Containers at the PyDays (https://www.pydays.at): “Data becomes more useful when it is shared. In our talk we present our findings and future goals about transferring data in a privacy respecting and traceable way. We want to lay out the technical foundation and demonstrate use cases in a live-coding session by accessing Semantic Containers with Jupyter notebooks. “
  • With the association Diabetes.services, Christoph has submitted a talk for the MyData 2019 in Helsinki.

We wish our readers happy Easter holidays – and we are looking forward to share with you the final Semantic Container project updates at end of June.

SEMCON Milestone 1/3

With Semantic Containers, the 7-member project team has set itself the goal of developing a prototype for the simple trading of data. Using a few use cases the “proof of concept” will be demonstrated. Three colleagues from the Own Your Data association and four contributors from the Vienna University of Technology team have come together to achieve this goal.

In the ambitious time frame of a total of 9 months project duration (bmvit / FFG call ICT of the Future: Exloring the data market 2018), already at the end of the first third we are happy to announce first successes:

  • Since two months the project website is online (German / English), which summarizes the most important information about the Semantic Containers at a glance.
  • Also online and linked to the website is our Semantic Containers White Paper, which presents the concept in more detail and offers a broader overview.
  • A first version of our design document is being finalized and will be online in the next few days.

In the past 3 months we have already been diligent. Amongst others, we presented the Semantic Containers at two major events, the Data Market Ignite Night on October 2nd at the Tribe Space and the ICT 2018: Imagine Digital – Connect Europe at the Austria Center on December 5th. Of course, we have seen each other much more often. So often that we alone and up to this point had 22 internal project meetings!

We are confident that it will continue similarly fast with the next two-thirds. The coming third (until March 2019) is dedicated to the programming work, so that we can fully dedicate our time to use cases in the last three months (until June 2019) as planned. Again, there is a happy message: Even the data selection has already taken place, so we could not only reach all our interim goals, but are our own schedule already one step ahead.

In this sense, we will enjoy the upcoming holiday season and wish our readers a peaceful Christmas! See you here again in the New Year when we have new Semantic Container updates available for all data-loving people.