Digital Watermarking

A digital watermark is a kind of marker covertly embedded in data and is also sometimes referred to as “the practice of imperceptibly altering a work to embed a message about that work”. For Semantic Container a digital watermark is a unique digital fingerprint that is applied to data provided by a Semantic Container, i.e., any data request results in a dataset with insignificant errors that uniquely identifies the recipient of the data set. In case such a dataset is leaked and appears in an unintended location, the person who originally requested and leaked the dataset can be identified. This blog post describes the design of the digital watermarking that will be implemented in the course of the currently ongoing MyPCH project.

Watermark embedding

To embed a watermark into a dataset the following two steps are performed:

  1. Pre-processing: the available data is split up into fragments of a defined size, e.g., all measurements from a single day
  2. Encoding: based on a secret parameter (or key) unique to the requesting party a sequence of errors with the same size as a data fragment is created and then applied to the original data, i.e., for numerical values this is just adding value and error

Watermark attacks

There are a number of possible attacks against digital watermarking:
  • Distortion Attack: There are different kind of distortions which may be applied to a dataset, e.g., rounding to the n-th digit. Rounding the values on the least significant digit preserves the data’s usability the most but may be detected more easily than rounding digits further up.
  • Deletion Attack: As with distortions attacks, different kind of deletions may be applied to a dataset to make the identification of the original recipient harder.
  • Collusion Attack: A collusion attack is performed by combination of n copies of the same dataset. For each measurement the mean of all n copies is calculated to create a new dataset.

Watermark Detection

To detect a watermark in a suspicious dataset the following two steps are performed and require the original data to be available:

  1. Detection: Through similarity search the suspicious dataset (already fragmented) is matched against original data fragments and in case of a match the difference between suspicious dataset and original dataset is the (possibly noisy) unique error
  2. Mapping: The extracted error is compared through similarity search with the original error based on the secret parameter (or key). In case of a match the original recipient of the data is identified.

The above process including various test cases for attacks will be implemented in the next weeks and will soon be available in the Semantic Container base package. Feel free to reach out to us with any questions or comments!

SEMCON Milestone 2/3 – Spring

The project is progressing rapidly and after 6 months, there is now a lot to report in the spring. With Peb Ruswono Aryan, a new team member of the Vienna University of Technology joined us and his areas of specialization are geoinformation systems and Python.

But we also want to share our completed milestones, give an outlook on a few events events, and summarize other activities:

  • The current state of the software is available at Github ( and Dockerhub (
  • The documentation has been updated and you find the current version of White Paper and System Design on our homepage in the Resources menu along with other relevant information.
  • Our business plan is developing well, further reviews with T-Systems, ZAMG and Inits are already being planned.
  • There is still a lively exchange with our project partners ZAMG and EODC. Altogether there were 21 meetings in the last 3 months!
  • To ensure a good progress even after the end of the bmvit / FFG call “ICT of the future: Sondierungen für den Datenmarkt 2018”, we submitted on March 28th as a technology supplier and part of a consortium a project propsoal to the ICT-13-2018-2019 H2020 Call.

Dates, Dates Dates … We are glad, if you come over!

  • On May 3rd, Christoph and Peb will give a lecture on Semantic Containers at the PyDays ( “Data becomes more useful when it is shared. In our talk we present our findings and future goals about transferring data in a privacy respecting and traceable way. We want to lay out the technical foundation and demonstrate use cases in a live-coding session by accessing Semantic Containers with Jupyter notebooks. “
  • With the association, Christoph has submitted a talk for the MyData 2019 in Helsinki.

We wish our readers happy Easter holidays – and we are looking forward to share with you the final Semantic Container project updates at end of June.

SEMCON Milestone 1/3

With Semantic Containers, the 7-member project team has set itself the goal of developing a prototype for the simple trading of data. Using a few use cases the “proof of concept” will be demonstrated. Three colleagues from the Own Your Data association and four contributors from the Vienna University of Technology team have come together to achieve this goal.

In the ambitious time frame of a total of 9 months project duration (bmvit / FFG call ICT of the Future: Exloring the data market 2018), already at the end of the first third we are happy to announce first successes:

  • Since two months the project website is online (German / English), which summarizes the most important information about the Semantic Containers at a glance.
  • Also online and linked to the website is our Semantic Containers White Paper, which presents the concept in more detail and offers a broader overview.
  • A first version of our design document is being finalized and will be online in the next few days.

In the past 3 months we have already been diligent. Amongst others, we presented the Semantic Containers at two major events, the Data Market Ignite Night on October 2nd at the Tribe Space and the ICT 2018: Imagine Digital – Connect Europe at the Austria Center on December 5th. Of course, we have seen each other much more often. So often that we alone and up to this point had 22 internal project meetings!

We are confident that it will continue similarly fast with the next two-thirds. The coming third (until March 2019) is dedicated to the programming work, so that we can fully dedicate our time to use cases in the last three months (until June 2019) as planned. Again, there is a happy message: Even the data selection has already taken place, so we could not only reach all our interim goals, but are our own schedule already one step ahead.

In this sense, we will enjoy the upcoming holiday season and wish our readers a peaceful Christmas! See you here again in the New Year when we have new Semantic Container updates available for all data-loving people.