OpenEuro LLM Explained: What You Need to Know

Key Takeaways

OpenEuro LLM creates open-source AI models that aim to follow European values and regulations, ensuring diversity and compliance.
The project is focused on commercial, industrial, and public services, making AI more available across Europe.
Its multilingual and multimodal features let the model process text, speech, and structured data, improving AI applications.
It faces data availability and computing power challenges but works to solve them through EuroHPC and better datasets.

In the global AI race, the focus is often on the U.S. and China, especially with cases like Gemini, DeepSeek, and OpenAI, but Europe is not falling behind.

OpenEuro LLM proves that the continent is serious about shaping the future of artificial intelligence on its own terms. Funded by the Digital Europe Programme, the project develops open-source, multilingual AI models that aim to align with European values and regulatory frameworks.

The model covers all 24 official European languages and more, ensuring AI remains open, accessible, and culturally inclusive. By prioritizing transparency and linguistic diversity, OpenEuro LLM helps Europe to stay competitive while maintaining digital sovereignty.

The European Commission recognized the project’s strategic importance by awarding it the Strategic Technologies for Europe Platform (STEP) Seal—the first Digital Europe Programme initiative to receive this mark of excellence. OpenEuro LLM secured €20.6 million in funding, bringing its total budget to €37.4 million.

But challenges remain. Europe must balance rapid AI development with ethical and transparent standards.

To provide deeper insight, CCN contacted Jan Hajič, OpenEuro LLM project coordinator at the Institute of Formal and Applied Linguistics, Charles University, Prague. He co-leads the project with Peter Sarlin, co-founder of Silo AI, a leading European AI lab specializing in enterprise and research-driven AI solutions. Hajič’s insights help explain OpenEuro LLM’s goals, real-world applications, and the challenges ahead, all covered here.

What Is OpenEuro LLM

OpenEuro LLM is a European initiative to develop high-performance, multimodal, open-source large language models (LLMs) for text, speech, and structured data. It serves industry, public services, and research, strengthening Europe’s position in the AI race.

The role of EuroLLM in research is particularly important, providing fully open models that allow academic institutions to experiment, analyze, and innovate without restrictions. According to Jan Hajič, “for academic and research groups, it is important to work with fully open models, which is exactly what OpenEuroLLM is aiming at.”

One of the main goals is to push innovation forward without losing control over data, security, and regulatory compliance.

The OpenEuro LLM project connects universities, research centers, companies, and EuroHPC institutions from across Europe. It aims to support the region’s digital sovereignty and ethical AI development.

OpenEuro LLM’s key features have the following characteristics:

Open-source model: Enables researchers and developers to fine-tune and adapt the model for different applications.
High-performance AI: Runs on EuroHPC supercomputing infrastructure, optimizing efficiency and large-scale deployment. The model processes 4 trillion tokens, drawing from web data, parallel translations, and high-quality sources.
Multilingual AI: Supports all official EU languages and more, making AI more accessible across Europe.
Multimodal processing: Processes text, speech, and structured data, allowing AI-driven applications beyond language tasks.
Regulatory compliance: Aligns with the EU AI Act, ensuring transparency, fairness, and strong data privacy protections.

EuroLLM’s models are designed to support 35 languages, including all official EU languages: Arabic, Chinese, Hindi, Japanese, Korean, Russian, and Turkish. This broad linguistic reach ensures accessibility across diverse communities.

Handling morphologically rich languages like Czech presents unique challenges in natural language processing (NLP). Unlike English-based models, these languages have complex word structures and grammatical variations, making them more difficult to process.

However, according to Jan Hajič, “while this has been the problem in the past, recent advances in technology, such as proper organization, are able to minimize the loss for such languages. In any case, these languages are very often low-resource languages, so there are still problems stemming from data sparsity.”

EuroLLM addresses these issues by leveraging structured data and multilingual training. This improves model performance across low-resource languages while ensuring more accurate and context-aware outputs.

EuroLLM’s Language Support and Model Advancements

The EuroLLM-1.7B model is the base version, trained on 4 trillion tokens for general tasks. It provides fast processing and serves as the foundation for more specialized versions. The EuroLLM-1.7B-Instruct model builds upon this by incorporating EuroBlocks fine-tuning, enhancing its ability to handle machine translation and instruction-following tasks with improved efficiency.

Meanwhile, EuroLLM-9B is the most advanced version so far, featuring 9 billion parameters and trained on 4 trillion tokens from web data, parallel translations, and high-quality datasets. It also has an instruction-tuned variant, EuroLLM-9B-Instruct, which fine-tunes the model on the EuroBlocks dataset to enhance instruction-following and machine translation capabilities.

It is designed for more complex language processing tasks, making it suitable for research, industry, and public services. All models operate under the Apache 2.0 license, ensuring free and open access to developers and researchers.

Below is a table comparing the basic models:

	EuroLLM-1.7B	EuroLLM-9B
Parameters	1.7B	9B
Training Data	4T tokens	4T tokens
Tuning	Pre-trained	Pre-trained
Languages	35 languages	35 languages
Processing speed	Fast	High performance
Best use	General tasks	Advanced AI tasks
GPUs Used	256 H100	400 H100
License	Apache 2.0	Apache 2.0

How Do EuroLLM models compare to Gemini, DeepSeek, and ChatGPT?

EuroLLM models provide open-source AI solutions for diverse applications, including EuroLLM-1.7B and EuroLLM-9B. They offer transparent and adaptable research and industry and public service alternatives.

GPT-4 by OpenAI delivers advanced language processing, generating highly coherent and contextually accurate text. Google’s Gemini specializes in conversational AI, leveraging Google’s vast search infrastructure to enhance responses.

DeepSeek aims for efficiency and strong performance despite having fewer resources. However, recent reports indicated an accuracy of 17% “in delivering news and information,” falling behind ChatGPT and Gemini.

Direct benchmark comparisons remain limited, but EuroLLM’s open-source model prioritizes multilingual accessibility, while GPT-4 and Gemini focus on proprietary advancements.

According to Jan Hajič, EuroLLM aims to provide comparable quality across all supported languages, ensuring equal performance across official and future EU languages. “Multilinguality, language equality, and language transparency are really important for Europe to operate as a truly Single Market with no language barriers.”

This focus on linguistic inclusivity sets EuroLLM apart from general-purpose LLMs like GPT-4 and LLaMA, Meta’s open-weight AI model designed for research and broad applications, which often prioritize higher-resource languages over low-resource European languages.

Challenges and Considerations

Developing a high-performing, multilingual AI model like OpenEuro LLM presents several challenges, particularly in competition, data availability, and ethical concerns.

Competition: OpenEuro LLM faces pressure from well-established U.S. and Chinese AI companies already deploying large-scale proprietary models. While some Chinese projects, such as DeepSeek, were developed at a lower cost, major US players like OpenAI and Google have secured billions in funding, strengthening their dominance. OpenEuro LLM’s challenge is offering a competitive open-source alternative aligned with European interests and values.
Data availability: Training a multilingual AI model requires high-quality, diverse datasets, particularly for low-resource European languages. Ensuring access to representative data will be crucial for achieving balanced linguistic representation.
Ethical concerns: Like all AI models, OpenEuro LLM must address bias, fairness, and potential misuse. A key issue in NLP is ensuring that models do not reinforce biases in training data or produce misleading or culturally insensitive content.

Hajič highlights two key obstacles in relation to this. “One of the main goals of the project is to overcome both: to get enough compute from the EuroHPC centers, and to get enough data especially for the low-resource languages to get equal or close to equal quality and cultural adequacy for those languages.” Addressing data limitations will be essential for achieving linguistic fairness.

He also points to persistent challenges in NLP, stating, “Most of NLP (even though not all) is implemented today through LLMs, so the well-known problems apply: hallucinations, inaccuracies, cultural unawareness, inconsistencies over time, etc.” OpenEuro LLM must work toward reducing these issues to ensure reliable and unbiased AI outputs.

The Future of OpenEuro LLM

The OpenEuroLLM project will develop the next generation of large language models designed to set new standards in multilingual AI, transparency, and regulatory alignment. The project will create advanced, open-source foundation models that surpass existing capabilities by leveraging cutting-edge research, high-quality datasets, and Europe’s strongest AI expertise.

These models will drive innovation across industry, academia, and public services, ensuring Europe remains at the forefront of AI development while upholding ethical, fair, and privacy-focused AI practices.

Conclusion

OpenEuro LLM puts Europe at the forefront of AI development, offering an open-source, multilingual alternative to proprietary models. Supporting finance, research, industry, and public services ensures AI remains accessible and aligned with European regulations.

The project faces challenges in data availability, computing power, and ethical AI, but its focus on transparency and linguistic diversity gives it a strong foundation.

With continued research, collaboration, and investment, OpenEuro LLM could help Europe build AI that serves its people, strengthens digital sovereignty, and competes globally.

FAQs

Will businesses be able to fine-tune OpenEuro LLM for specific needs?

Yes, the fully open-source model allows businesses to fine-tune and adapt it for industry-specific applications.

Will OpenEuro LLM be free to use?

As an open-source project, the model will be freely accessible to developers, researchers, and businesses for commercial and public use.

How does OpenEuro LLM promote transparency?

OpenEuro LLM shares tools, training insights, dataset enrichment pipelines, and anonymization frameworks to support ethical AI development.

How does OpenEuro LLM support future AI advancements?

OpenEuro LLM builds a developer and stakeholder community across public and private sectors, fostering innovation and collaboration.

About the Author

Dr. Lorena Nessi

Dr. Lorena Nessi is an award-winning journalist and media technology expert with 15 years of experience in digital culture and communication. Based in Oxfordshire, UK, she combines academic insight with hands-on media practice.

She holds a PhD in Communication, Sociology, and Digital Cultures, and an MA in Globalization, Identity, and Technology.

Lorena has taught at Fairleigh Dickinson University, Nottingham Trent University, and the University of Oxford. She is a former producer for the BBC in London, with additional experience creating television content in Mexico and Japan.

Her research focuses on digital cultures, social media, technology, capitalism, and the societal impact of blockchain innovation.

She has written extensively on digital media and emerging technologies, with her work featured in both academic and media platforms. Her Web3 expertise explores how blockchain technologies shape culture, economics, and decentralized systems.

Outside of work, Lorena enjoys reading science fiction, playing strategic board games, traveling, and chasing adventures that get her heart racing. A perfect day ends with a relaxing spa and a good family meal.

[email protected]

X.com