Wilson Holland
8 min readFeb 5, 2020

--

Hacking the NBA, Maximizing DFS Lineups with Machine Learning

This is part 1 of a series I am writing about using math to try to win in daily fantasy basketball. Each part of this series will be about a different strategy I am using, how that strategy is performing, and why I tried that strategy. Some of the reasons I am doing this is because I like basketball, fantasy sports, math, and gambling. Finally, daily fantasy sports are games of skill, so it must be possible to turn that skill into algorithms.

I am Nuggets fan but my models are team agnostic. This just means that my emotions will not affect the outcomes. I hope you enjoy the summary as much as I have enjoyed doing the work.

Let’s get to how I am doing it.

I have been using python, specifically pulp and pandas, to edit the data. I have been using the FanDuel lineup csv exports for the daily contest, and I have been using the python ‘basketball-reference-web-scraper’ to pull information into my own datasets. For the following analysis the only data that needs to be pulled is the FanDuel daily lineup csv. Below is a summary of my MAX model which I have been running in FanDuel contests for about two months.

Daily Fantasy Basketball

For those of you who do not know how fantasy basketball works. I will provide a quick breakdown. Players gain points for scoring baskets, getting rebounds, steals, and blocks. They lose points for committing turnovers.

In daily fantasy basketball, you create a new lineup for each slate of games. The slate of games are for whichever NBA games that are on that night. The result of this is that you have access to LeBron James as a player one night but maybe not the next. It all depends when they are playing. Players have salaries associated with using them in your lineup and you must fill certain positions to create a lineup. You are limited by a total salary cap and every position must be filled. In FanDuel, where the following data will come from, the total salary cap is $60000, and the positions you are required to fill are:

  • 2 Point Guards (PG)
  • 2 Shooting Guards (SG)
  • 2 Small Forwards (SF)
  • 2 Power Forwards (PF)
  • 1 Center (C )

To give you an example of a player that can be included in a lineup , LeBron James is a SF and he costs $11200 to put him in your lineup. Prices are provided by FanDuel and are changed/based on internal FanDuel algorithms.

Contests in FanDuel are organized in different sized tournaments with different buy-in/payout structures. I have been testing all my models in the daily NBA Piggy Bank Shot. It is a tournament size of around 280,000 with a 5 cent buy-in and the top 30% get paid out. People can enter up to 150 lineups so it is important to note that 280,000 unique people are not playing at once.

The goal of all my models is to consistently predict a fantasy lineup that scores in that top 30 percent in a way that I can make money. This can be done:

  1. Ideally by scoring in the top 30% every night
  2. By scoring in a higher percentile where the payouts are bigger every once in awhile, a model to achieve this might try to take boom/bust players and play them night after night to wait for the stars to align.
  3. By picking when to bet more money, this would work if I could determine risk and size my bets accordingly.

The MAX Model

I call this model the MAX. It is very simple, I maximize fantasy points per game (FPPG) by salary using the pulp — LPMaximize function.

A very high level explanation is you have multiple linear constraints. These constraints are graphed and there is a feasible area where all solutions exist. The maximum point within the feasible area is found and that will be your final solution. An example of a function we would graph in this case would be PG1 + PG2 + SG1 + SG2 + SF1 + SF2 + PF1 + PF2 + C1 ≤ $60000, where each position represents the salary. This salary constraint was mentioned earlier.

Source

For information on how to map this yourself in python, I highly suggest this article. It is excellent in explaining how the linear programming modeling works and gives great starter code for creating your own modeling engine.

I make some assumptions/adjustments with the dataset provided from FanDuel before running the maximize function.

  1. Team defense does not matter, by this I mean that this model assumes that the player is averaging the same amount of points regardless of the opponent they play. This is an inherent assumption of the model that is likely not true, but I will have to create a different model to test that.
  2. Player’s averages are unknown until they play 10 games. I chose 10 games for no particular reason. I will be able to determine a good cutoff by choosing an acceptable standard deviation and looking at games played when they reach that deviation. The idea here is that the more games they play the smaller the deviation is. This will work in theory, but I have a hunch that either I will have to select a large deviation or they will have to play too many games to be included. The downside of my 10 game cutoff is my model misses out on star players who have been injured.
  3. Any injuries remove a player from my dataset. This includes players who have a game time decision (GTD). This is purely for risk reduction. I would rather lose a couple points then take zero points because I cannot reliably check my lineups right before the game.

Analysis

This first plot shows how each lineup performed in total score, including the expected-average (EA) and in-the-money (ITM). The EA is what they would score if my model was perfect. ITM is the last score that won money for that contest.

MAX — Yellow, ITM — Blue, EA — Red

As you can see the MAX is all over the place, and there is also a slight upward trend to the EA as the season progresses. I should note that players who haven’t missed games are nearing 50 games played this season.

This next plot shows MAX and ITM against the EA. It is another way to look at the performance vs. expected average.

MAX — Yellow, ITM — Blue, EA — Red

As you can see again, sometimes I am in-the-money using this strategy but more often than not I don’t appear to make the average expected fantasy points. Overall, the average scores are:

  • MAX: 291.69
  • ITM: 311.96
  • EA: 311.32

The above plot and averages also show that even when making the EA It is not a given that I will be ITM.

The next graph concerns sample size. Because each night is based on a different number of teams, the player pool grows or shrinks. The below graph shows average points based on the number of teams that played that night.

MAX — Yellow, ITM — Blue, EA — Red, Note: 4, 16, 26, 28 have a sample size of 1 (N=1)

This information is helpful because it can determine how much to bet based on the number of teams playing. What is heartening for my model is that when 18 games are played, I am averaging close to ITM. If I look back at my winnings when 18 teams are playing, I have been ITM more than I have not. Typically you double your money when you are ITM so a large bet placed at the correct time could pay good dividends.

Finally, I want to talk a little about the players my model has been choosing and a couple of trends I’ve noticed. I would provide data here, but I found a bug in my logging code so the information is wonky. So instead I will talk qualitatively, using specific players as examples.

Good players get priced out

Earlier in the season, Luka Doncic, the PG for the Mavericks, and Andre Drummond, the C for the Pistons, were reasonably priced for the fantasy points they produced and were both included almost every night in my models. As they had good game after good game their salaries rose to where they are occasionally included. I do not think this influences my total lineup score, it just means there are better value for the money players.

Players that don’t deserve to be in

FanDuel doesn’t solely assign player value based on their season average, however my maximize algorithm is using only that. Because of this method, you get players such as Dario Saric, PF for the Suns. He averages less than 10 FPPG for long stretches and then goes on a couple game streak of much greater scoring than his season average of 24. Because of this he is very cheap so the model picks him to play every time. This certainly has an effect on the final output because another similarly priced player might sacrifice a couple points for a whole lot less risk.

Injuries stop certain players from getting in

Players like Anthony Davis, PF for the Lakers, or Kyrie Irving, PG for the Nets, are rarely featured. Consistently they score very well, but just as consistently they have a game time decision tag. Because of the GTD tag, they don’t get included in the maximizing. This has an effect on the model. Many times these players will definitely play and one misses big games by them. I could remove the tag before I run the model but again if for some reason they don’t play, my lineup takes a zero in that player spot.

Cheaper Utility Players

These players show up every game, they average the same points every game, and cost basically the same every game. An example of a player that shows up every game is Harrison Barnes, SF for the Kings. He scores 25–30 FPPG and is around $4500-$5000. This doesn’t have an effect on the model. However, if you are trying get in the very high ITM scores where the tournament payouts are larger, then he is not the player you want because his ceiling is only so limited.

Conclusion and Next Steps

There is some promise in maximizing season FPPG from a profit perspective. I say this because:

  1. The EA average score and ITM average score are very close together, as more games are played I would expected the MAX average to converge to EA.
  2. With certain player pools, there seems to be a better chance of winning.

However, I think some improvements could be made to mitigate some of the apparent issues.

  1. Handle injuries differently so that the player pool is larger and is a more complete picture of the current contest.
  2. Incorporate some sort of recent form calculation to determine which players had a good run in the early season but are now benched.

Finally, I think while the MAX model is a good starting point, but there are much better models that can be written.

Thank you for reading, if you have any ideas or questions, comment below. In my next article, I will cover another algorithm for fantasy basketball.

--

--

Wilson Holland

I am a software engineer by trade, and a sport and data enthusiast on the side. I write about using data to make decisions.