Key Takeaways
In the global AI race, the focus is often on the U.S. and China, especially with cases like Gemini, DeepSeek, and OpenAI, but Europe is not falling behind.
OpenEuro LLM proves that the continent is serious about shaping the future of artificial intelligence on its own terms. Funded by the Digital Europe Programme , the project develops open-source, multilingual AI models that aim to align with European values and regulatory frameworks.
The model covers all 24 official European languages and more, ensuring AI remains open, accessible, and culturally inclusive. By prioritizing transparency and linguistic diversity, OpenEuro LLM helps Europe to stay competitive while maintaining digital sovereignty.
The European Commission recognized the project’s strategic importance by awarding it the Strategic Technologies for Europe Platform (STEP) Seal—the first Digital Europe Programme initiative to receive this mark of excellence. OpenEuro LLM secured €20.6 million in funding, bringing its total budget to €37.4 million.
But challenges remain. Europe must balance rapid AI development with ethical and transparent standards.
To provide deeper insight, CCN contacted Jan Hajič, OpenEuro LLM project coordinator at the Institute of Formal and Applied Linguistics, Charles University, Prague. He co-leads the project with Peter Sarlin, co-founder of Silo AI, a leading European AI lab specializing in enterprise and research-driven AI solutions. Hajič’s insights help explain OpenEuro LLM’s goals, real-world applications, and the challenges ahead, all covered here.
OpenEuro LLM is a European initiative to develop high-performance, multimodal, open-source large language models (LLMs) for text, speech, and structured data. It serves industry, public services, and research, strengthening Europe’s position in the AI race.
The role of EuroLLM in research is particularly important, providing fully open models that allow academic institutions to experiment, analyze, and innovate without restrictions. According to Jan Hajič, “for academic and research groups, it is important to work with fully open models, which is exactly what OpenEuroLLM is aiming at.”
One of the main goals is to push innovation forward without losing control over data, security, and regulatory compliance.
The OpenEuro LLM project connects universities, research centers, companies, and EuroHPC institutions from across Europe. It aims to support the region’s digital sovereignty and ethical AI development.
OpenEuro LLM’s key features have the following characteristics:
EuroLLM’s models are designed to support 35 languages, including all official EU languages: Arabic, Chinese, Hindi, Japanese, Korean, Russian, and Turkish. This broad linguistic reach ensures accessibility across diverse communities.
Handling morphologically rich languages like Czech presents unique challenges in natural language processing (NLP). Unlike English-based models, these languages have complex word structures and grammatical variations, making them more difficult to process.
However, according to Jan Hajič, “while this has been the problem in the past, recent advances in technology, such as proper organization, are able to minimize the loss for such languages. In any case, these languages are very often low-resource languages, so there are still problems stemming from data sparsity.”
EuroLLM addresses these issues by leveraging structured data and multilingual training. This improves model performance across low-resource languages while ensuring more accurate and context-aware outputs.
The EuroLLM-1.7B model is the base version, trained on 4 trillion tokens for general tasks. It provides fast processing and serves as the foundation for more specialized versions. The EuroLLM-1.7B-Instruct model builds upon this by incorporating EuroBlocks fine-tuning, enhancing its ability to handle machine translation and instruction-following tasks with improved efficiency.
Meanwhile, EuroLLM-9B is the most advanced version so far, featuring 9 billion parameters and trained on 4 trillion tokens from web data, parallel translations, and high-quality datasets. It also has an instruction-tuned variant, EuroLLM-9B-Instruct , which fine-tunes the model on the EuroBlocks dataset to enhance instruction-following and machine translation capabilities.
It is designed for more complex language processing tasks, making it suitable for research, industry, and public services. All models operate under the Apache 2.0 license, ensuring free and open access to developers and researchers.
Below is a table comparing the basic models:
EuroLLM-1.7B | EuroLLM-9B | |
Parameters | 1.7B | 9B |
Training Data | 4T tokens | 4T tokens |
Tuning | Pre-trained | Pre-trained |
Languages | 35 languages | 35 languages |
Processing speed | Fast | High performance |
Best use | General tasks | Advanced AI tasks |
GPUs Used | 256 H100 | 400 H100 |
License | Apache 2.0 | Apache 2.0 |
EuroLLM models provide open-source AI solutions for diverse applications, including EuroLLM-1.7B and EuroLLM-9B. They offer transparent and adaptable research and industry and public service alternatives.
GPT-4 by OpenAI delivers advanced language processing, generating highly coherent and contextually accurate text. Google’s Gemini specializes in conversational AI, leveraging Google’s vast search infrastructure to enhance responses.
DeepSeek aims for efficiency and strong performance despite having fewer resources. However, recent reports indicated an accuracy of 17% “in delivering news and information,” falling behind ChatGPT and Gemini.
Direct benchmark comparisons remain limited, but EuroLLM’s open-source model prioritizes multilingual accessibility, while GPT-4 and Gemini focus on proprietary advancements.
According to Jan Hajič, EuroLLM aims to provide comparable quality across all supported languages, ensuring equal performance across official and future EU languages. “Multilinguality, language equality, and language transparency are really important for Europe to operate as a truly Single Market with no language barriers.”
This focus on linguistic inclusivity sets EuroLLM apart from general-purpose LLMs like GPT-4 and LLaMA, Meta’s open-weight AI model designed for research and broad applications, which often prioritize higher-resource languages over low-resource European languages.
Developing a high-performing, multilingual AI model like OpenEuro LLM presents several challenges, particularly in competition, data availability, and ethical concerns.
Hajič highlights two key obstacles in relation to this. “One of the main goals of the project is to overcome both: to get enough compute from the EuroHPC centers, and to get enough data especially for the low-resource languages to get equal or close to equal quality and cultural adequacy for those languages.” Addressing data limitations will be essential for achieving linguistic fairness.
He also points to persistent challenges in NLP, stating, “Most of NLP (even though not all) is implemented today through LLMs, so the well-known problems apply: hallucinations, inaccuracies, cultural unawareness, inconsistencies over time, etc.” OpenEuro LLM must work toward reducing these issues to ensure reliable and unbiased AI outputs.
The OpenEuroLLM project will develop the next generation of large language models designed to set new standards in multilingual AI, transparency, and regulatory alignment. The project will create advanced, open-source foundation models that surpass existing capabilities by leveraging cutting-edge research, high-quality datasets, and Europe’s strongest AI expertise.
These models will drive innovation across industry, academia, and public services, ensuring Europe remains at the forefront of AI development while upholding ethical, fair, and privacy-focused AI practices.
OpenEuro LLM puts Europe at the forefront of AI development, offering an open-source, multilingual alternative to proprietary models. Supporting finance, research, industry, and public services ensures AI remains accessible and aligned with European regulations.
The project faces challenges in data availability, computing power, and ethical AI, but its focus on transparency and linguistic diversity gives it a strong foundation.
With continued research, collaboration, and investment, OpenEuro LLM could help Europe build AI that serves its people, strengthens digital sovereignty, and competes globally.
As an open-source project, the model will be freely accessible to developers, researchers, and businesses for commercial and public use. OpenEuro LLM shares tools, training insights, dataset enrichment pipelines, and anonymization frameworks to support ethical AI development. OpenEuro LLM builds a developer and stakeholder community across public and private sectors, fostering innovation and collaboration.Will OpenEuro LLM be free to use?
How does OpenEuro LLM promote transparency?
How does OpenEuro LLM support future AI advancements?