About Me

I'm a AI Engineer and Software Developer who loves the thrill of redefining what it possible. I have an Associate's Degree from Tusculum University and I am currently enrolled there working toward my bachelor's. Currently, I work as an AI Engineer building and training Large Language Models and other types of AI models from next-token prediction to span corruption denoising to embedding and classification. I also do app development, web development, and other things from time to time – like maintenance on this website (which I initially built at 15) though not nearly as much as I used to.

I've been programming for over 7 years – since I was 10. I officially published my first project when I was 11 on Replit. Since then I've tried out building all sorts of things from search engines to compilers to image formats to websites to apps to games, and so much more. Many of those past projects are listed here on this website.

Now that I've been working professionally for a few years, I haven't put out as many public published projects – but that doesn't mean I haven't been building. AI is my passion now and I love the experience of building, curating, and processing training data for the models I build. I also love to design new architectures and implement things I find on ArXiv. Here you can find a tiny portion of my past projects. If you want information on my experience and education, I encourage you to check out my LinkedIn.

Recent Projects

Expressionizer

A Python library for symbolic math expression building, simplification, and step-by-step evaluation. Renders explanations in plain text and LaTeX, supports derivatives, integrals, and multivariate calculus, and ships with localization packs for 7 languages. I primarily use it to procedurally generate math training data with full solution traces.

AOMTS

Aurora Optimized Multi-Token Superposition — a 9-checkpoint ablation series at ~100M parameters and 3,000 steps screening whether Token Superposition Training and Multi-Token Prediction improve language model quality on Wikipedia Markdown. Best run: TST (bag s=6) + MTP=1 at 2.205 nats val loss, ~0.083 nats better than the no-TST/no-MTP baseline. Nine models on Hugging Face, equal step budget, Aurora + AdamW.

Chinchilla 300M: ConvSwiGLU vs SwiGLU

A controlled 300M-parameter Chinchilla-optimal pretraining comparison on English Wikipedia Markdown. Same decoder-only GQA transformer, TST (bag s=6), MTP depth 1, Aurora optimizer, and NVFP4 training — only the FFN block differs. ConvSwiGLU hit 2.515 nats final val loss vs 2.664 for SwiGLU (−5.6%, +165k params). Trained on an RTX 5090 with ~6B tokens per variant.

See the full archive →