gien.app

Hi, im Lukas! đź‘‹

Lukas Gienapp

I am a Researcher at the ScaDS.AI Center for Scalable Data Science and Artificial Intelligence of Leipzig University. I am passionate about all things in Text Mining, Data Science, and Information Retrieval. I work on generative models for search, and search for generative models.

Professional Experience

  • Researcher

    Research on Generative Models for Search and Search for Generative Models.

    Deep Semantic Learning Group, Kassel University

  • Researcher

    Research on Generative Models for Search and Search for Generative Models.

    ScaDS.AI Centre for Scalable Data Science & Artificial Intelligence, Leipzig

  • Researcher

    Research on Web Search, Crowdsourcing & Evaluation, and Plagiarism Detection

    Text Mining & Retrieval Group, Leipzig University

  • Student Assistant

    Research Infrastructure, Technical Support, Experiment Assistance

    Institute for Sociology, Leipzig University

  • Student Assistant

    Programming, Typesetting, Research Assistance

    Institute for Translatology, Leipzig University

Teaching Experience

I have given seminars and lab sessions on both bachelors and masters level covering topics in ML, NLP, and IR:

Education

  • M.Sc. Data Science

    Leipzig University

  • M.Sc. Digital Humanities

    Leipzig University

  • B.Sc. Digital Humanities

    Leipzig University

  • B.A. Linguistics

    Leipzig University

  • Highschool

    Gymnasium Carolinum Bernburg

Publications

Filter by research type:

  • , , , , , , & (). The German Commons – 154 Billion Tokens of Openly Licensed Text for German Language Models. CoRR, abs/2510.13996. Data
  • , , & (). Topic-Specific Classifiers are Better Relevance Judges than Prompted LLMs. CoRR, abs/2510.04633. Methods Evaluation
  • , , & (). Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins. 275–285. Methods
  • , , & (). Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins. 275–285. Methods
  • , , , , , & (). The Viability of Crowdsourcing for RAG Evaluation. ACM . Data Evaluation
  • , , , & (). ml4xcube: Machine Learning Toolkits for Earth System Data Cubes. 28302–28311. Methods
  • , , , , , , , , & (). Resources for Combining Teaching and Research in Information Retrieval Courses. ACM . Teaching
  • , , , , , , , , , , & (). Evaluating Generative Ad Hoc Information Retrieval. ACM . Evaluation
  • , , , , , , , , , , , , , , , & (). Shared Tasks as Tutorials: A Methodical Approach. AAAI Press . Teaching
  • , , , , , , & (). The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives. ACM . Data
  • , , , , , & (). SMAuC - The Scientific Multi-Authorship Corpus. IEEE . Data
  • , , & (). Bootstrapped nDCG Estimation in the Presence of Unjudged Documents. Springer . Evaluation
  • , , , & (). A large dataset of scientific text reuse in Open-Access publications. Scientific Data, 10(1). Data
  • , , , , , , , , , & (). Webis at TREC 2022: Deep Learning and Health Misinformation. National Institute of Standards; Technology (NIST) . Methods
  • , , & (). Sparse Pairwise Re-ranking with Pre-trained Transformers. ACM . Methods
  • , & (). Tracking Discourse Influence in Darknet Forums. CoRR, abs/2202.02081. Analyses
  • , , , , , , , & (). The Impact of Main Content Extraction on Near-Duplicate Detection. International Open Search Symposium . Analyses
  • , , , , , , , , , & (). Overview of TouchĂ© 2021: Argument Retrieval. Springer . Methods
  • , , , , , & (). CopyCat: Near-Duplicates Within and Between the ClueWeb and the Common Crawl. ACM . Data
  • , , & (). Estimating Topic Difficulty Using Normalized Discounted Cumulated Gain. ACM . Evaluation
  • , , & (). The Impact of Negative Relevance Judgments on NDCG. ACM . Evaluation
  • , , , , , , , , , & (). Overview of TouchĂ© 2020: Argument Retrieval. Springer . Methods
  • , , & (). Efficient Pairwise Annotation of Argument Quality. Association for Computational Linguistics . Evaluation Data Methods
  • , , , , , , & (). Argument Search: Assessing Argument Relevance. ACM . Evaluation

Awards & Grants

  • Best Paper Honourable Mention Award at 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 25) for paper The Viability of Crowdsourcing for RAG Evaluation. (Citation: , et al., , et al. (). The Viability of Crowdsourcing for RAG Evaluation. ACM . Data Evaluation )
  • SIGIR Student Travel Grant at the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020) for the paper Estimating Topic Difficulty Using Normalized Discounted Cumulated Gain. (Citation: , et al., , et al. (). Estimating Topic Difficulty Using Normalized Discounted Cumulated Gain. ACM . Evaluation )
  • SIGIR Student Travel Grant at the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) for the paper Argument Search: Assessing Argument Relevance. (Citation: , et al., , et al. (). Argument Search: Assessing Argument Relevance. ACM . Evaluation )