Why are data spaces necessary?
Nowadays, there are sectors where it is essential to share data among different actors within the same sector. A prototypical example of this is customs. At the borders, if data is shared between neighboring countries, not only can effort be reduced by avoiding double checks, but optimal functioning can also be achieved if data is shared. For example, being able to detect a potential veterinary issue early if the entry country knows in advance the load a truck is carrying thanks to the information provided by the exit country.
However, this can also introduce new complexities that must be resolved. Some of them are:
- Interoperability. When sharing data, standards must be defined so that both systems or both parties understand the characteristics of the shared data. For example, in the case of customs, it is common to use some of the standards defined by the WCO (World Customs Organization), which, among others, standardizes product or raw material codes.
- Data usage. When sharing data, we lose control over it. This can be a problem if the conditions of use of these data are not explicitly stated. For example, in the case of customs, there may be data that the exit country could share with the entry country for a specific purpose, but it may be afraid that the entry country uses them for other purposes.
- Security. When sharing data digitally, we are introducing new risks as we open systems to third parties. For example, in a country’s customs system that was only used internally, exposing certain data to third parties can pose a security risk.
- Data protection. When sharing sensitive data (e.g., personal data), it is necessary to ensure that the person receiving this data maintains the privacy levels required for the use that will be given to them. In some cases, this involves using anonymization techniques before sharing the data.
What are data spaces or data spaces?
Data spaces aim to solve the problems that arise when sharing data among different actors. A data space is a way to share data among different actors while ensuring the rights of each participant. The basic idea is to move from data access control to data usage control.
As an example, one of the basic pillars of the European Data Strategy is the creation of common and interoperable data spaces throughout the EU in strategic sectors. The goal is to overcome the barriers (legal, technical, etc.) that currently exist for data exchange, which will undoubtedly enable the implementation of innovative projects on this data and the generation of new businesses and services. To achieve this, it is necessary to implement data infrastructures and governance frameworks to facilitate data pooling and exchange.
KEY TECHNOLOGICAL ELEMENTS IN DATA SPACES
From a technical perspective, a data space can be understood as a collection of technical components that facilitate a dynamic, secure, and continuous flow of data/information between parties and domains. These components can be implemented in many different ways and can be implemented in different runtime frameworks (e.g., Kubernetes). According to Open DEI, they can be classified as follows:
The technological building blocks that belong to this category are:
- Data models and formats: this basic component establishes a common format for data model specifications and data representation in data exchange payloads. Combined with the basic data exchange API component, this ensures total interoperability between participants.
- Data exchange API: this basic component facilitates the exchange and sharing of data (i.e., data provisioning and consumption/use) between data space participants. An example of a data interoperability building block that provides a common data exchange API is the “Context Broker” of the Connecting Europe Facility (CEF), recommended by the European Commission for sharing data at the right time among various organizations.
- Data provenance and traceability: this building block provides the means to track and trace in the data provisioning and consumption/use process. Therefore, it provides the basis for a number of important functions, from identifying data lineage to logging auditable transactions. It also enables the implementation of a wide range of application-level tracking use cases, such as product tracking or material flow tracking in a supply chain.
DATA SOVEREIGNTY AND TRUST
The technological building blocks that facilitate data trust and sovereignty are:
- Identity Management (IM): the IM basic component enables the identification, authentication, and authorization of stakeholders operating in a data space. It ensures that organizations, individuals, machines, and other actors receive recognized identities, and that these identities can be authenticated and verified, including provisioning of additional information1, for authorization mechanisms to use to enable access control and use. The IM building block can be implemented on the basis of readily available IM platforms that cover parts of the required functionality. Examples of open-source solutions are the KeyCloak infrastructure, the Apache Syncope instant messaging platform, the Shibboleth Consortium’s open-source instant messaging platform, or the FIWARE IM framework. Integration of the IM component with the eID component of the Connecting Europe Facility (CEF), which supports electronic identification of users across Europe, would be particularly important. The creation of federated and trusted identities in data spaces may be supported by European regulations such as EIDAS.
- Trusted exchange: this building block facilitates reliable data exchange among participants, assuring participants in a data exchange transaction that the other participants are who they say they are and that they comply with the defined rules/agreements. This can be achieved through organizational measures (e.g., certification or verified credentials) or technical measures (e.g., remote attestation).
- Access/use control/policies: this component ensures compliance with data access and use policies defined as part of the terms and conditions established when data resources or services are published (see the “Publication and Services Marketplace” basic component below) or negotiated between providers and consumers. A data provider typically implements data access control mechanisms to prevent misuse of resources, while data use control mechanisms are typically implemented on the data consumer side to prevent misuse of data. In complex data value chains, prosumers combine both mechanisms. Access control and use control are based on identification and authentication.
DATA VALUE CREATION
The technological building blocks that facilitate data value creation are:
- Metadata and discovery protocol: this basic component incorporates mechanisms for publishing and discovering data resources and services, making use of common descriptions of resources, services, and participants. Such descriptions can be both domain-independent and domain-specific. They must be enabled by semantic web technologies and include linked data principles.
- Data usage accounting: this building block provides the basis for accounting for data access and/or usage by different users. This, in turn, supports important clearing, payment, and billing functions (including data exchange transactions without data market participation).
- Publication and services marketplace: To support the offering of data resources and services under defined terms and conditions, markets must be established. This basic component supports the publication of these offerings, the management of processes related to the creation and tracking of smart contracts (which clearly describe rights and obligations for the use of data and services) and access to data and services.
Depending on technical needs, the corresponding backend processes for qualification, clearing, and billing can be executed. Therefore, the building block facilitates the dynamic expansion of data spaces with more stakeholders, data resources, and data analysis/processing services (such as big data analysis services, machine learning services, or statistical processing model services for different business functions). It must include capabilities to publish data resources following the widely accepted DCAT (Data Catalogue Vocabulary) standards, and to gather data from existing open data publishing platforms.