Under development — I'm actively building this site.

About

I'm a Ph.D. candidate in Artificial Intelligence at KAIST, advised by Se-Young Yun in the Optimization and Statistical Inference (OSI) Lab. My research is in efficient language modeling — the architectures, training, and infrastructure that let us run large models at scale without paying the full quadratic bill. Lately that means long-context inference — sparse attention, KV-cache compression, speculative decoding, and getting models to manage their own attention span.

Alongside the research, I build and run the lab's HPC compute — a multi-node GPU cluster — which is where most of my interest in distributed training and TPU/XLA performance comes from.

Research interests

  • Long-context inference — sparse attention, KV-cache compression, speculative decoding
  • Large-scale model training (DDP / FSDP / SPMD)
  • TPU / XLA performance engineering
  • Cultural and multilingual evaluation of LLMs and diffusion models
  • Explanatory reasoning in LLMs

Contact

BibTeX