My CV
Contact Details
Feel free to email me (address below) if you need more than just my email.
Education
Post-Doc, 2024–
LMU Munich
- Research topics: Character-level NLP, Low-Resource NLP
- Advisor: Prof. Alexander Fraser
Ph.D. Computational Linguistics, 2019–2023
University of Groningen
- Thesis: The Little Data That Could: Making the Most of Low-Resource Natural Language Processing
- Advisors: Dr. Antonio Toral, Prof. Gertjan van Noord
M.Sc. Artificial Intelligence, 2017–2019
University of Groningen
- Master’s thesis on unsupervised neural machine translation under advisors Dr. Antonio Toral and Dr. Jennifer Spenader
B.Sc. Computer Science, 2016–2017
Indiana University
- Graduated with Highest Honors with a 3.97/4.0 GPA
Computer Science Major, 2012–2016
University of Puget Sound
- Bachelor’s project on automatic reference-finding in the Latin Vulgate
Publications
- Lukas Edman, Lisa Bylinina, Faeze Ghorbanpour, and Alexander Fraser. Are BabyLMs Second Language Learners?, Proceedings of the BabyLM Challenge at the 28th Conference on Computational Natural Language Learning. 2024.
- Lukas Edman, Helmut Schmid, and Alexander Fraser. CUTE: Measuring LLMs’ Understanding of Their Tokens, Proceedings of the Association for Computational Linguistics: EMNLP 2024.. 2024.
- Lukas Edman, Gabriele Sarti, Antonio Toral, Gertjan van Noord, and Arianna Bisazza. Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation, Transactions of the Association for Computational Linguistics: TACL 2024. 2024.
- Lukas Edman, Lisa Bylinina. Too Much Information: Keeping Training Simple for BabyLMs, Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning. 2023.
- Konstantin Chernyshev, Ekaterina Garanina, Duygu Bayram, Qiankun Zheng, and Lukas Edman. LCT-1 at SemEval-2023 Task 10: Pre-training and Multi-task Learning for Sexism Detection and Classification, Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023). 2023.
- Lukas Edman, Antonio Toral, and Gertjan van Noord. Subword-Delimited Downsampling for Better Character-Level Translation, Findings of the Association for Computational Linguistics: EMNLP 2022. 2022.
- Lukas Edman, Antonio Toral, and Gertjan van Noord. Patching Leaks in the Charformer for Efficient Character-Level Generation, arXiv preprint arXiv:2205.14086. 2022.
- Lukas Edman, Antonio Toral, and Gertjan van Noord. The Importance of Context in Very Low Resource Language Modeling, 18th International Conference on Natural Language Processing (ICON2021). 2021.
- Lukas Edman, Ahmet Üstün, Antonio Toral, and Gertjan van Noord. Unsupervised Translation of German–Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language, EMNLP 2021 Sixth Conference on Machine Translation (WMT21). 2021.
- Achieved first place for Lower Sorbian→German translation
- Lukas Edman, Antonio Toral, and Gertjan van Noord. Low-Resource Unsupervised NMT: Diagnosing the Problem and Providing a Linguistically Motivated Solution, The 22nd Annual Conference of the European Association for Machine Translation (EAMT2020). 2020.
- Lukas Edman, Antonio Toral, and Gertjan van Noord. Data Selection for Unsupervised Translation of German–Upper Sorbian, EMNLP 2020 Fifth Conference on Machine Translation (WMT20). 2020.
- Christian Roest, Lukas Edman, Gosse Minnema, Kevin Kelly, Jennifer Spenader and Antonio Toral. Machine Translation for English–Inuktitut with Segmentation, Data Acquisition and Pre-Training, EMNLP 2020 Fifth Conference on Machine Translation (WMT20). 2020.
- Antonio Toral, Lukas Edman, Galiya Yeshmagambetova, and Jennifer Spenader. Neural Machine Translation for English–Kazakh with Morphological Segmentation and Synthetic Data, ACL 2019 Fourth Conference on Machine Translation (WMT19). 2019.
- Achieved first place for English→Kazakh translation
Work Experience
Instructor, 2017–2023
University of Groningen
- Gave various lectures on Machine Learning, NLP, Computer Vision, and Audio Processing
- Supervised students on projects for SemEval shared tasks
- Led tutorial and computer lab sessions
- Wrote and graded coursework
- Invigilated and graded exams
- Courses taught (Master):
- Shared Task Information Science, 2022–23
- Courses taught (Bachelor):
- Introduction to Machine Learning, 2022–23
- Machine Learning Project, 2021–22
- Courses assisted (Master):
- Shared Task Information Science, 2020–22
- Language Technology Project, 2020–22
- Natural Language Processing, 2018–19
- Pattern Recognition, 2018–19
- Courses assisted (Bachelor):
- Machine Learning Project, 2020–21
- Advanced Algorithms and Data Structures, 2018–19
- Artificial Intelligence I, 2017–19
Thesis Supervisor, 2020–2023
University of Groningen
- Supervised Master’s students on their thesis projects.
- Project on low-resourec
- Project on translation of Dutch–Gronings and other Lower Saxon dialects.
- Project on unsupervised NMT for English–Chinese.
Reviewer, 2019–
LMU Munich, University of Groningen
- Regular reviewer for *ACL conferences and affiliated workshops.
Technical Skills
- Programming Languages
- Currently using: Python, Shell
- Prior experience with: Java, JavaScript, C
- Software and Libraries
- Currently using: PyTorch, HuggingFace
- Prior experience with: TensorFlow