VI.b Challenges - BDV Big Data Value Association

Data governance. Data governance is a key aspect when managing data sharing in organisations and digital ecosystems, and it has evolved during the last few years, going from a reactive stage to a more systematic, harmonised, interoperable, and proactive approach. Still, some challenges should be overcome so the full potential of data is exploited while guaranteeing data sovereignty and fair management of data (some of them further explained in points below): data rights, data silos due to different approaches to data management, poor data quality, lack of trust in data, lack of data control, lack of data context.

Lack of trust in AI systems. Technology in general and AI in particular are moving very fast, but this speed is not mirrored by users’ trust (organisations and society). There exist different factors for this:

Trust is not absolute; over-trust and under-trust can be harmful, so ‘calibrated trust’ is essential. What can a user reasonably expect from a system, specifically when GenAI is used, and there can be issues around bias and authenticity of information? It is important that the AI system assesses its trustworthiness and communicates this to the user.
Explainability has raised as a key AI challenge in knowledgeable stakeholders (researchers, developers, domain experts). Still, when AI becomes more widespread, society (non-experts) need relevant insight into what is ‘going on under the hood’. Transparency is an important issue: how was the AI system designed, in what (algorithmic) business process is it operating, and what are the goals? This relates to how AI fosters or threatens trust among citizens and public (and private) institutions. Citizens may become increasingly unwilling to share data with these institutions.
On a higher level, ethics and human rights need to be assessed in a more systematic and transparent way. However, there is not much experience with this: which framework out of the current multitude of ethical frameworks works best? And do the people involved work with them professionally?
Specifically important is the risk of malicious use of AI – for instance, the introduction of ChatGPT led to the introduction of FraudGPT quite quickly, allowing cybercriminals to draft even more realistic phishing messages. Other impacts of GenAI need to be addressed in education and arts when creating texts, and images can be automated in an undetectable way.
Training data extraction attacks, that involves extracting some of the training data from a model, also pose a significant risk to ML models since if the attacker can retrieve some of the training data from the model, this could also imply having access to sensitive or confidential information that was used to train the model.

Finally, other aspects that affect trust are bias and fairness in AI systems (data quality with respect to AI), authenticity of the information, and ownership and authorship of the content (copyright management and tracking in AI-based scenarios).

Ethics. When society becomes more ‘AI literate’, we can expect a much more intensive public debate around the ethics of AI. An obvious example is the health domain, where the pandemic has shown that widespread public discussion must be conducted carefully and requires effort to interpret voices based on different value-systems correctly. The multitude of ethics frameworks is a challenge: which is the best for what situation, and are the involved stakeholders well prepared to apply a framework? We can expect the rise of domain-specific frameworks, while the challenge is to keep these frameworks as much as possible interoperable and reduce complexity.

Management of regulations, certifications and compliance. The evolution of laws and regulations makes it more difficult for companies in the EU market to implement them: eIDAS 2.0, Digital Markets Act, AI Act, Data Act, etc. There can be overlaps and contradictions when implementing those regulations, specifically when working in an international context where different member states have adopted different national legislation. Also, domain-specific regulations are expected (finance, healthcare, manufacturing, …), which creates at least challenges where these domains overlap. The lack of suitable business incentives for innovations that have value beyond compliance may also become an issue.

This challenge also includes considering compliance at different TRL levels (research -> innovation -> business) and moving from one step to another.

The law impacts technology, but it also works the other way around: new developments in AI impact copyright, IP, and other regulations. The challenge is to explore innovative policy and law development via regulatory sandboxes, living labs, and other mechanisms for anticipatory regulation.

Finally, methodologies for oversight mechanisms (auditing, testing, monitoring) need to be extended to keep up with both technological and regulatory advancements.

Synthetic data generation: ethical and legal challenges. As explained in the BDVA Strategic Theme 2, the generation of synthetic data can solve the scarcity of quality data. However, this opportunity also brings several ethical and legal challenges around the quality of such data (that can result in the collapse of GenAI models trained with them), ownership (who is entitled to the synthetic data and what are their rights and responsibilities), the provenance of the data, and liability (who is liable if synthetic data leads to incorrect conclusions or decisions).

Security, privacy and data protection. The cat-and-mouse game between the data protection professionals and the cybercriminals will continue.

Adversaries also use AI for their work. While we continue deploying data-driven and AI-supported approaches, the attack surface (and adversaries’ level of interest) only rises. Classical anonymisation procedures no longer work reliably because so much additional data is available to re-identify individuals. Encryption is also vulnerable to attacks based on quantum computers.
Further progress on PETs is needed regarding privacy guarantees, performance, and usability. This requires both fundamental and applied research and knowledge transfer between researchers and practitioners.
In the context of data spaces, data protection and privacy are not yet sufficiently elaborated. For instance, the inclusion of PETs is still an open issue. This is also hindered by the lack of standardization in PETs and the lack of automated procedures to design a specific PET solution for a specific analytics question.
Even when data protection has been implemented well, the challenge of effectively managing user consent is still an open point. The dynamic nature of consent poses challenges to the most recent ML procedures. Training AI also needs to address intellectual property rights, preferably in a (semi-) automated way.
Finally, the federation of data spaces across borders and domains and the cloud/edge/IoT computing continuum introduce technical, organisational and legal complexity for data protection. Value chains become more complex and operate across multiple ICT layers.

Lack of required skills. A considerable challenge is the need for SMEs to acquire more skills and resources for the adoption of AI and associated data protection solutions. This skills gap must be addressed globally, which is already a challenge but also faces increasingly complex geopolitical issues. Besides skills for workers, some primary Data and AI literacy among citizens are needed to avoid potentially negative impacts on a personal level. For instance, end users need to understand the effects of algorithms on their information consumption, but they also need to know when or when not to trust the output of an AI. Technologically, there is a related issue because of the predominance of the English language for LLMs. This is linked to data availability across linguistic regions.

BDVA Strategic Agenda

How can we help?