Hi, im Lukas! đź‘‹
I am a Researcher at the ScaDS.AI Center for Scalable Data Science and Artificial Intelligence of Leipzig University. I am passionate about all things in Text Mining, Data Science, and Information Retrieval. I work on generative models for search, and search for generative models.
Professional Experience
- Researcher
Research on Generative Models for Search and Search for Generative Models.
Deep Semantic Learning Group, Kassel University
- Researcher
Research on Generative Models for Search and Search for Generative Models.
ScaDS.AI Centre for Scalable Data Science & Artificial Intelligence, Leipzig
- Researcher
Research on Web Search, Crowdsourcing & Evaluation, and Plagiarism Detection
Text Mining & Retrieval Group, Leipzig University
- Student Assistant
Research Infrastructure, Technical Support, Experiment Assistance
Institute for Sociology, Leipzig University
- Student Assistant
Programming, Typesetting, Research Assistance
Institute for Translatology, Leipzig University
Teaching Experience
I have given seminars and lab sessions on both bachelors and masters level covering topics in ML, NLP, and IR:
- Scalable Language Technologies
- Foundations of Machine Learning
- Big Data & Language Technologies
- Advanced Information Retrieval
- Information Retrieval
Education
- M.Sc. Data Science
Leipzig University
- M.Sc. Digital Humanities
Leipzig University
- B.Sc. Digital Humanities
Leipzig University
- B.A. Linguistics
Leipzig University
- Highschool
Gymnasium Carolinum Bernburg
Publications
Filter by research type:
- Gienapp, L., Schröder, C., Schweter, S., Akiki, C., Schlatt, F., Zimmermann, A., Genêt, P. & Potthast, M. (2025). The German Commons – 154 Billion Tokens of Openly Licensed Text for German Language Models. CoRR, abs/2510.13996. Data
- Gienapp, L., Potthast, M., Scells, H. & Yang, E. (2025). Topic-Specific Classifiers are Better Relevance Judges than Prompted LLMs. CoRR, abs/2510.04633. Methods Evaluation
- Gienapp, L., Deckers, N., Potthast, M. & Scells, H. (2025). Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins. 275–285. Methods
- Gienapp, L., Deckers, N., Potthast, M. & Scells, H. (2025). Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins. 275–285. Methods
- Gienapp, L., Hagen, T., Fröbe, M., Hagen, M., Stein, B., Potthast, M. & Scells, H. (2025). The Viability of Crowdsourcing for RAG Evaluation. ACM . Data Evaluation
- Peters, J., Neumann, A., Jaeger, M., Gienapp, L. & Umlauft, J. (2025). ml4xcube: Machine Learning Toolkits for Earth System Data Cubes. 28302–28311. Methods
- Fröbe, M., Scells, H., Elstner, T., Akiki, C., Gienapp, L., Reimer, J., MacAvaney, S., Stein, B., Hagen, M. & Potthast, M. (2024). Resources for Combining Teaching and Research in Information Retrieval Courses. ACM . Teaching
- Gienapp, L., Scells, H., Deckers, N., Bevendorff, J., Wang, S., Kiesel, J., Syed, S., Fröbe, M., Zuccon, G., Stein, B., Hagen, M. & Potthast, M. (2024). Evaluating Generative Ad Hoc Information Retrieval. ACM . Evaluation
- Elstner, T., Loebe, F., Ajjour, Y., Akiki, C., Bondarenko, A., Fröbe, M., Gienapp, L., Kolyada, N., Mohr, J., Sandfuchs, S., Wiegmann, M., Frochte, J., Ferro, N., Hofmann, S., Stein, B., Hagen, M. & Potthast, M. (2023). Shared Tasks as Tutorials: A Methodical Approach. AAAI Press . Teaching
- Reimer, J., Schmidt, S., Fröbe, M., Gienapp, L., Scells, H., Stein, B., Hagen, M. & Potthast, M. (2023). The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives. ACM . Data
- Bevendorff, J., Sauer, P., Gienapp, L., Kircheis, W., Körner, E., Stein, B. & Potthast, M. (2023). SMAuC - The Scientific Multi-Authorship Corpus. IEEE . Data
- Fröbe, M., Gienapp, L., Potthast, M. & Hagen, M. (2023). Bootstrapped nDCG Estimation in the Presence of Unjudged Documents. Springer . Evaluation
- Gienapp, L., Kircheis, W., Sievers, B., Stein, B. & Potthast, M. (2023). A large dataset of scientific text reuse in Open-Access publications. Scientific Data, 10(1). Data
- Bondarenko, A., Fröbe, M., Gienapp, L., Pugachev, A., Reimer, J., Schlatt, F., Artemova, E., Potthast, M., Stein, B., Braslavski, P. & Hagen, M. (2022). Webis at TREC 2022: Deep Learning and Health Misinformation. National Institute of Standards; Technology (NIST) . Methods
- Gienapp, L., Fröbe, M., Hagen, M. & Potthast, M. (2022). Sparse Pairwise Re-ranking with Pre-trained Transformers. ACM . Methods
- Akiki, C., Gienapp, L. & Potthast, M. (2022). Tracking Discourse Influence in Darknet Forums. CoRR, abs/2202.02081. Analyses
- Fröbe, M., Hagen, M., Bevendorff, J., Völske, M., Stein, B., Schröder, C., Wagner, R., Gienapp, L. & Potthast, M. (2021). The Impact of Main Content Extraction on Near-Duplicate Detection. International Open Search Symposium . Analyses
- Bondarenko, A., Gienapp, L., Fröbe, M., Beloucif, M., Ajjour, Y., Panchenko, A., Biemann, C., Stein, B., Wachsmuth, H., Potthast, M. & Hagen, M. (2021). Overview of Touché 2021: Argument Retrieval. Springer . Methods
- Fröbe, M., Bevendorff, J., Gienapp, L., Völske, M., Stein, B., Potthast, M. & Hagen, M. (2021). CopyCat: Near-Duplicates Within and Between the ClueWeb and the Common Crawl. ACM . Data
- Gienapp, L., Stein, B., Hagen, M. & Potthast, M. (2020). Estimating Topic Difficulty Using Normalized Discounted Cumulated Gain. ACM . Evaluation
- Gienapp, L., Fröbe, M., Hagen, M. & Potthast, M. (2020). The Impact of Negative Relevance Judgments on NDCG. ACM . Evaluation
- Bondarenko, A., Fröbe, M., Beloucif, M., Gienapp, L., Ajjour, Y., Panchenko, A., Biemann, C., Stein, B., Wachsmuth, H., Potthast, M. & Hagen, M. (2020). Overview of Touché 2020: Argument Retrieval. Springer . Methods
- Gienapp, L., Stein, B., Hagen, M. & Potthast, M. (2020). Efficient Pairwise Annotation of Argument Quality. Association for Computational Linguistics . Evaluation Data Methods
- Potthast, M., Gienapp, L., Euchner, F., Heilenkötter, N., Weidmann, N., Wachsmuth, H., Stein, B. & Hagen, M. (2019). Argument Search: Assessing Argument Relevance. ACM . Evaluation
Awards & Grants
- Best Paper Honourable Mention Award at 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 25) for paper The Viability of Crowdsourcing for RAG Evaluation. (Citation: Gienapp, Hagen et al., 2025 Gienapp, Hagen et al. (2025). The Viability of Crowdsourcing for RAG Evaluation. ACM . Data Evaluation )
- SIGIR Student Travel Grant at the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020) for the paper Estimating Topic Difficulty Using Normalized Discounted Cumulated Gain. (Citation: Gienapp, Stein et al., 2020 Gienapp, Stein et al. (2020). Estimating Topic Difficulty Using Normalized Discounted Cumulated Gain. ACM . Evaluation )
- SIGIR Student Travel Grant at the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) for the paper Argument Search: Assessing Argument Relevance. (Citation: Potthast, Gienapp et al., 2019 Potthast, Gienapp et al. (2019). Argument Search: Assessing Argument Relevance. ACM . Evaluation )