Data Analyst, aspiring Data Scientist, programming enthusiast.
MS in Business Analytics
LinkedIn Profile
Sadly the tournament didn’t happned this year, so I’ll have to wait until next year to officially compete on Kaggle. On the training data provided by Kaggle, by including external data and summary season stats I was able to achieve log-loss in the .3-.5 range for each of the 2014-2019 seasons, and better accuracy than just always picking the higher ranked team. Some of the work that went into the project is below.
Looking through data.gov, I saw quite a bit of data provided by Fairfax County tax administration office. The data consisted of assessed values for the land and buildings for each parcel, including residential and commercial, so I filtered just for those residential parcels with only 1 building on it to try to get an equivalent to single-family homes. I was originally looking for home prices similar to what might be on RedFin or Zillow, but since they had made so much data available it seemed a shame not to use it.
Page template forked from evanca