AI-ready Data Products

Evolving the concept of “data product” to meet the specific demands of AI applications: ease the adoption of existing data sharing frameworks by AI practitioners, and act as a catalyser of AI innovation.

Share this post
Search the blog
Stay up to date!

Follow us on Twitter

Follow us on Linkedin

Subscribe to our monthly Newsletter!

Author: Daniel Alonso Román (BDVA Senior Technical Lead Big Data and AI ecosystems)

Evolving the concept of “data product” to meet the specific demands of AI applications: ease the adoption of existing data sharing frameworks by AI practitioners, and act as a catalyser of AI innovation.

The term “data product” was originally coined in the seminal work of Zhamak Dehghani in 2019[1], where she presented the data mesh paradigm, and, as a key aspect, applied the product thinking to datasets, to make them easily discoverable, addressable, trustworthy, interoperable and secure.

Since then, the concept of a data product has been embraced by many organisations to streamline data reuse across various use cases, reducing costs and saving time. According to the Gartner Hype Cycle for Data Management in 2024, data products appear to be still on the rise but close to the peak of expectations. Additionally, “inquiries for the term data product have more than doubled in 2023 compared to 2022”, and “1 in 2 organisations studied have already deployed data products”[2], showing a clear interest of the industry.

While there is no agreed-upon definition of data product, the concept of packaging datasets along with all the relevant elements identified by an organisation to facilitate their discovery, exchange, and consumption by others clearly supports data sharing and transactions. As an example, CEN proposes data product as a key element in Trusted Data Transactions[3], defined therein as “standardised data unit packaging data and relevant conditions into a useable form”. This is why the data product concept has also been adopted by designers and implementers of data spaces, the instruments identified by the European Commission in their “Data Strategy”[4] to break data silos in Europe and foster cross-sector and cross-country data sharing in a trusted and efficient way.

Nowadays, the primary application of data has moved from traditional data analytics to increasing sophisticated AI workflows. These impose unique demands on data, including specific descriptions, new data quality dimensions and metrics, and tools to facilitate risk assessment in compliance with standards or regulations like the AI Act. The traditional concept of a data product (focused on packaging and sharing datasets for general use) must then evolve towards supporting specialised requirements of AI, a new paradigm that in BDVA we refer to as “AI-ready Data Products”.

With this new paradigm in mind, BDVA organised a dedicated session during its recent Data Week 2024[5], held in Luxembourg on 10th December 2024. The session brought together several experts to evaluate this approach from various perspectives and discuss which elements of the data product concept should be revisited, what additional features might be required, and how “AI-ready Data Products” can play a pivotal role in fostering AI innovation.

Coen Janssen, Policy Office at EC DG CNECT, opened the session by framing the discussion within the context of the European Data Union Strategy and the political guidelines established from 2024 to 2029 by Ursula von der Leyen, as well as the European Data Act and the related European Commission standardisation request (composed of five requested deliverables: Trusted Data Transaction standard, Data catalogue implementation framework, Semantic assets implementation framework, Data governance standard for data space participants and Maturity model for Common European Data Spaces).

Shane O’Seasnain (Eindhoven University of Technology) exposed the importance of the domain experts on data, AI and platforms behind the data product to facilitate how AI-ready Data Products connect to other applications, like digital twins.

Jordi Cabot (Luxemburg Institute of Science and Technology) highlighted the impact on AI outcomes when data is biased. He showed its impact on textual data and images, and how the outcomes of search engines and language models outcomes can be biased depending on the query and the data provided to the algorithms for training or caching. He concluded with the importance of data annotation on a multidimensional space – usage, distribution, composition, provenance and social aspects.

In the case of Anastasia Sofou (SEMIC), she presented the Machine Learning extension for DCAT-AP (MLDCAT-AP) as the ideal metadata solution to describe “AI-ready Data Product”, and also to address some of the requirements of the AI Act regarding data quality and data governance. The novel solution incorporates the quality of data, the algorithm class, the ML model, the risks of the model and the dataset used for training, essential to comply with the AI Act.

Chandra Challagonda (FIWARE CEO) emphasised data provenance as key at every stage of the data value chain to ensure transparency, origin tracking, rights management, and in summary foster adoption of data products, but also key for accountability, traceability and auditability of AI applications – essentials also for compliance with the AI Act legislation.

Finally, Fabrice Tocco (Dawex co-CEO) emphasised the fact that data products are not just data, highlighting the importance of trust as a non-negotiable feature to materialise “AI-ready Data Products”.

The topic fostered a nice discussion with the audience, who brought additional aspects such us the incorporation of knowledge graphs, LLMs to enable semantic interoperability between data products, how to apply the new concept to data spaces, and the process to assess the readiness of ‘AI-ready Data Products’ for deployment in real-world.

After this fruitful session, we are firmly convinced that ‘AI-ready Data Product’ is key to serve a dual purpose: to ease the adoption of existing data sharing frameworks, and to act as catalyser of AI innovation. First, meeting specific demands of AI based applications in an easy manner broadens the scope of utilisation of existing data-sharing frameworks, like dataspaces. By embedding those needs into the ‘AI-ready Data Product’ concept, the requirements can be addressed and satisfied across the various building blocks of the data space architecture. Second, it can act as a powerful enabler of AI innovation, offering industry-ready solutions tailored to address the complexities and unique requirements of cutting-edge AI applications.

Despite these relevant outcomes and progress, significant challenges must be overcome for ‘AI-ready Data Products’ to become a reality. BDVA is committed to addressing these challenges by engaging more experts and stakeholders from the community to develop a comprehensive data product framework tailored to the unique needs of AI applications. We firmly believe this paradigm will enhance data sharing for AI across SMEs and large industries in a trusted, seamless, and legally compliant manner, aligned with existing and emerging regulations.

[1] https://martinfowler.com/articles/data-monolith-to-mesh.html

[2] https://www.gartner.com/en/documents/5456063

[3] https://www.cencenelec.eu/media/CEN-CENELEC/News/Workshops/2024/2024-01-16%20-%20Data%20Transactions/cwa-draft-part1-0-8_clean.pdf

[4] https://digital-strategy.ec.europa.eu/en/policies/strategy-data

[5] https://data-week.eu/session/ai-ready-data-products/

Tags in this post