About

I am a Ph.D. candidate in Artificial Intelligence at KAIST, advised by Se-Young Yun in the Optimization and Statistical Inference (OSI) Lab, and I work on efficient long-context language models.

A model pays for its entire context at every step, even though most of what it reads never changes the answer. My research goes after that waste. SpotAttention plugs a learned selector into pretrained models, Attention-Span has the model declare where it will attend as it reasons, and Content-Aware Sparsity prunes context by what it says rather than where it sits.

Before my focus settled on long context, I pretrained models from scratch on multi-node TPU pods, with static graphs and GSPMD sharding, and landed parts of that work upstream in PyTorch/XLA and Hugging Face Transformers. Along the way I’ve also worked on speculative decoding, and mentored undergraduate teams building benchmarks that test how well text and image models represent cultures far from their training data.

Systems

I build and look after the systems my research depends on. The largest is a shared GPU cluster with one login everywhere, storage that follows each user between nodes, and a queue that stays fair in the week before a deadline. The same habit followed me home, where a smaller rack hosts everything I’d otherwise rent from the cloud.

Offline

For someone whose hobbies include servers, I don’t spend much free time in front of them. Most of it goes outdoors, to a motorbike, to calisthenics, and to a camera for whatever the day turns up. The quieter evenings belong to books.

Contact

Email: Email
GitHub: @huzama
LinkedIn: huzama