Avatar

I am currently a post-doc at LMU, researching various topics, including character-level NLP, low-resource pretraining, and machine translation. My most recent work has been on benchmarking LLMs’ understanding of the characters in their tokens, which is surprisingly lacking. I believe that although everyone uses BPE and it is quite powerful, it is not the way forward in the long run.

I am also quite interested in efficiency methods for making training large models more feasible for everyone, not just big tech companies. In that front I’ve participated in the BabyLM Challenge, both in 2023 and currently in 2024. Humans clearly learn more efficiently from the “training data” we encounter, but on the other hand it takes us years to develop fluency in a language, whereas we are training LLMs in a matter of months. Is there a best of both worlds? That’s what I’d like to find out.

Generally speaking, I am interested in the similarities and differences between machine learning and human learning. I believe there is much we can learn and incorporate from our own brains into ML models. For instance, as we sleep, we are consolidating information, reinforcing things we learned that day, finding ways to recycle neural pathways for other purposes, and more. This is all done without direct access to any input, but ML models currently use input from real-world data to learn. How we can incorporate more of the brain’s learning strategies into ML models is something I hope we can learn.