Occasionally in my free time I work on and write about small side projects where I apply statistics, python, machine learning, or miscelaneous data science related things to explore a topic that I find personally interesting or amusing.
Rating systems are usually compared by how well they can predict future results, these single dimensional evaluations fail to capture how the behaviors and performances of rating systems change over time. Fortuitiously, methods from the field of rating systems research are well suited for exactly that problem, so I was able to develop some systems for rating rating systems with rating systems. Published in SIGBOVIK 2024. [paper] [code]
In the field of machine learning, first order optimizers are so dominant that second order methods are considered "higher order". I think this is absurd and set out to implement ACtually Higher Order Optimzers using jax and was able to get up to the 11th order. Published in SIGBOVIK 2023. [paper] [code]
Deep Deterministic Policy Gradients (DDPG) and Gradient Boosted Decision Trees (GBDT) have very little in common besides that they both involve machine learning and they both have "gradient" in their names. So naturally I worked to combine them and develop DDPGBDT, a truly "worst of both worlds" abomination which I somehow managed to get to solve the easiest RL environment with 1 random seed after weeks of hyperparameter tuning. I submitted the paper to the SIGBOVIK parody conference and won the Most Frighteningly Like Real Research Award. [paper] [code]
While the common K-means clustering algorithm is one of the simplest algorithms in machine learning, the K-means loss function can also be optimized by gradient descent and the standard algorithm is closely related to Newton's method of optimization. In this project I derived, implemented, and experimented with three different ways of minimizing the k-means loss function. [writeup] [code]
I learned in a stats class that there are particular geometric properties ingrained within the Z and Wald statistical hypothesis tests. This project exploits this geometry and estimates the value of pi to a high degree by monte carlo sampling data sets, fitting OLS regression on them, performing hypothesis tests on the coefficients, and comparing the ratio of test acceptances. [writeup] [code]
I read on reddit that LeBron James had averaged 27 points, 7 rebounds, and 7 assists per game for 15 seasons but had never completed one game with exactly those statistics. I attempted to model his statistics and calcualte the probability of that never happening. [writeup] [code]