About
I'm a Ph.D. candidate in Artificial Intelligence at KAIST, advised by Se-Young Yun in the Optimization and Statistical Inference (OSI) Lab. My research is in efficient language modeling — the architectures, training, and infrastructure that let us run large models at scale without paying the full quadratic bill. Lately that means long-context inference — sparse attention, KV-cache compression, speculative decoding, and getting models to manage their own attention span.
Alongside the research, I build and run the lab's HPC compute — a multi-node GPU cluster — which is where most of my interest in distributed training and TPU/XLA performance comes from.
Research interests
- Long-context inference — sparse attention, KV-cache compression, speculative decoding
- Large-scale model training (DDP / FSDP / SPMD)
- TPU / XLA performance engineering
- Cultural and multilingual evaluation of LLMs and diffusion models
- Explanatory reasoning in LLMs