About Me

Hello! 👋

My name’s Nathan. I live in the greater Seattle Washington area, and love anything and everything computational.

During the week I’m a software engineer at Microsoft. Here, I spend most of my time training small language models, working with LLM evaluations, and developing tooling to support our evaluation and data generation pipelines. Outside of work, I enjoy learning about multilinguality, low-resource NLP, knowledge distillation, and general deep learning.

I graduated in August 2025 from Clemson University with my Master’s in Computer Science with a focus on Data Science & Informatics, and I received my Bachelor’s degree in May 2025 in Computer Science with a minor in cybersecurity.

I have over a decade of self-led experience, having worked with companies such as Microsoft, Giant Oak (now part of Fidelity’s Saifr), and Ally Financial, and have professional experience in a variety of fields including software development, artificial intelligence research, cybersecurity, and full-stack development.

Over the past couple years I have shifted focus toward artificial intelligence and machine learning research, with my first publication on knowledge distillation being accepted to EMNLP 2023. More recently, I’ve published work on training the first large language models trained for the Setswana language, and have worked on smaller curiosity projects such as experimenting with improving LLMs’ ability to count letters, some basic model pruning, and training African-centric multilingual models. Back when I worked under Clemson’s IBM Wattson in the Watt CI, I also helped develop systems for classifying FAST ultrasound exams and analyzing text in COVID hospital reports. Even further back, I’ve assisted in virtual reality research under Clemson VRFE, and developed a novel method for estimating the circumference of an ellipse back in high school (working on re-visiting this to get a proper writeup done).

My current focus is on Finch, a heavily filtered corpus designed for data-efficient LLM instruction-tuning. Finch aggregates popular open-source English SFT instruction sequences and then rigorously filters them for quality and ngram diversity. This process utilizes a custom ModernBERT classifier, trained on LLM quality scores, followed by thorough deduplication and diversity thresholds to yield a high signal-to-noise corpus ideal for training and distillation tasks.

I’m very interested in pursuing research and am always open to collaborations. If you’d like to work together, please reach out.

I’ve also worked as the codirector of CUhackit, Clemson University’s official student hackathon organization, the only hackathon organization in South Carolina recognized by MLH, and home of the largest hackathon in South Carolina! We do a lot of interesting work and help get many future engineers their first start, so if you or your organization are interested in working with us don’t hesitate to reach out and I’ll get you in contact with the team.

On the side, I love skiing, going on camping and hiking trips, and playing retro games on my homemade arcade cabinet. 👾

Feel free to reach out with any inquiries at nbrown9@clemson.edu. Cheers!

Nathan Brown