IV.b Challenges - BDV Big Data Value Association

Lack of powerful AI edge devices. When implementing edge AI in IoT devices, there are several factors to consider: type and complexity of the AI application, HW and SW capabilities of the device, integration and communication with the cloud and other edge devices, everything aligned with the required size, weight, power and cost. All those requirements make it complex to design and produce powerful AI edge devices. In this way, although different edge AI devices address those challenges to varying degrees, ongoing research, innovation, and collaboration are essential to advance further the capabilities and deployment of edge AI technologies in diverse applications and industries.
Lack of tools for workload balance and scheduling in the computing continuum, with specific solutions for workloads for hybrid architectures AI/QC/HPC. Scheduling problems (the process of assigning operations to resources over time to optimise a particular criterion) have mainly been studied in the literature, with an optimal solution only sometimes computable and often solved in a heuristic fashion. The heterogeneous nature of the computing continuum introduces a higher level of complexity.
Related to the previous point, there is also the need to orchestrate all the heterogeneous resources that compose the computing continuum to fully exploit its potential and ensure their effective and optimised usage. These orchestration systems would also support the seamless delivery of applications over the infrastructures, providing the required Quality of Service and handling, among others, resource selection, deployment, monitoring, and run-time control of the resources and applications. All those requirements imply new challenges that force involved actors to extend existing orchestrator applications to cope with more complex, heterogeneous and distributed infrastructure (with resources located across different layers of the continuum).
Interoperability. Seamless connections between resources in the continuum require different levels of interoperability, although there are some standards that regulate this connection[1] [2]. There are still many gaps, and we need a framework that guarantees interoperability between the different domains.
Lack of fitting between AI / ML workflows & BD analytics requirements. Modern MLOps and DataOps practices are practical and can be implemented on a large scale. Additionally, most guidelines and SW tools in these areas are geared towards cloud-based, containerised deployments rather than bare-metal, dedicated supercomputers. Achieving the full potential of the distributed HPC hardware, thus maximising training speed while minimising costs and energy consumption, requires deep knowledge of the HPC infrastructure and the core internal workings of the distributed training software, along with effective monitoring and benchmarking. Moreover, the innovation in AI is fast, and backward compatibility of SW and specialised HW is often a problem. This leads to a requirement for very specific versions of drivers, libraries and programs to work correctly. AI and Data practitioners must still familiarise themselves with containers and preinstalled libraries.
Data protection and privacy is a crucial aspect in any data-driven ecosystem. It becomes much more challenging when dealing with the dynamic, heterogeneous, and decentralised paradigm of the computing continuum, which involves the combination of different technologies with their properties and constraints and involving other stakeholders with different roles and interests. The seamless flow of data along the continuum (thanks to the interoperability mentioned above) has to be accompanied by the required protection measures that guarantee at each stage of the process the confidentiality, integrity, compliance, accountability and privacy specified conditions, also considering technical, organisational and legal complexity for data protection along complex value chains involving cross-organizational, cross-domain and cross-border data exchanges and across multiple ICT layers. This challenge should also consider specific attacks on the AI/ ML workflows deployed on top of the continuum, e.g., adversarial attacks inserting fake data into the training of the ML models to corrupt them.
Resilience and security of infrastructures. This is still an unsolved issue, as it is reflected in the recent EC Horizon Programme call on “Secure Computing Continuum (IoT, Edge, Cloud, Dataspaces)” ^[3], that asks for “advanced, smart and agile protection mechanisms to manage the security and privacy of individual components throughout their lifecycle and of overall systems. The complexity of such interconnected environments underlines the need for the proactive and automated detection, analysis, and mitigation of cybersecurity attacks. Integrating end-to-end security and user-centric privacy in complex distributed platforms requires work to address security threats and vulnerabilities over the entire platform ecosystem”.
Real applications of quantum computing. As mentioned in the trends part, the following years will see the consolidation of quantum computing as a mature technology. However, it seems that the quantum community has been more focused on proving the speed of quantum computing than applying this to real-life applications and problems^[4]. For quantum computing to be fully applicable, there are still some challenges to solve, like fragility, prone to noise, scalability, cost and accessibility.
Lack of specific skills. Need to train AI and Data practitioners on the specificities of HPC infrastructures, including deep knowledge of the HPC infrastructure and the core internal working of the distributed training software, effective monitoring and benchmarking, the usage of containers, and preinstalled libraries.
Access to infrastructure. SMEs encounter several challenges in their quest to access HPC resources, including a lack of awareness of its use and advantages, lack of knowledge on how to access them, limited access, budget constraints, or data security risks ^[5]. This can be extended to the whole computing continuum, which even increases the difficulties for its access and usage by small actors. This is related to the previous point about the lack of specific skills to use HPC infrastructures, primarily by AI and data communities and SMEs at large.
Considering all the above (lack of full interoperability, standards, tools to properly manage jointly the resources, lack of specific skills, lack of awareness), it is difficult to think of the continuum’s scalability in the short/medium term[6].
The setting up of AI Factories represents a huge step towards the connection between AI and HPC, which is also driven by the access to and use of data. But their full operationalisation still presents some challenges that need to be addressed: how to connect AI Factories between them (and with other ecosystems: EDIHs, TEFs, AIoD,…), how to guarantee seamless access to data, how to facilitate and motivate the onboard and use from industry, and how they could adapt to the whole lifecycle of AI models (generation, training, fine-tuning, distribute and use).

BDVA Strategic Agenda

How can we help?