Language Models

Modern language models enable secure, real-time AI on local devices for various industries like automative, robotics or healthcare. The goal of DSgenAI is to build a complete pipeline spanning specialized data curation, base model pre-training, and task-oriented optimization that delivers lean, domain-specific models for industry-relevant tasks and demonstrates their value in deployable prototypes.

For further information you can contact Sebastian Scharrer at Fraunhofer IIS.

Publications

Joint Workshop on Legal and Ethical Issues in Human Language Technologies and Computational Approaches to Language Data Pseudonymization, Anonymization, De-identification, and Data Privacy (LEGAL2026 and CALD-pseudo 2026) @ LREC 2026
A Taxonomy of Safety: Harmonizing LLM Benchmarks in a Fragmented Landscape
From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models
Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy

Latest Posts

An Agentic System to Solve your Data Science Tasks

ELMOD-2.7B: Bringing a German-first LLM to your smartphone

Make the Most of Your Knowledge Graph with RAGONITE

For Retrieval Augmented Generation, Context Is all You Need