How to Scrape Historical DraftKings Data in Under 20 Minutes

June 8, 2017

Whenever I look for a new data source on the NFL or daily fantasy betting, I find a bunch of sad souls that don’t realize how easy it is to scrape data. People always want excel workbooks that already have all the data they need modeled up and ready to go but unfortunately they have to pay for that. In this post I hope to get you sold on web scraping and give you the tools and examples needed to get started with creating your own sports data “trading” firm.

Scrape Code for DraftKings or FanDuel

That’s it. A whopping 13 lines to scrape 20,546 draft kings’ player salaries and 51 pages and it took 20 minutes for me to do it start to finish — video tutorial here. From here you have a few options:

Learn how to download any data you could ever want by watching the video and using my github link (4-10 hours to acquire the basic skills).
Cheat and download the dataset (2 seconds)
Use my sports data service Sports Data Direct (Receive immediate access; 34.95/month)
Or you could just continue being a scavenger like homes over hereWhatever you choose is fine but if you do choose the Han Solo route buy me a beer.

[tiny_coffee]

15 thoughts on “How to Scrape Historical DraftKings Data in Under 20 Minutes”

Lee

November 29, 2017 at 3:51 PM Reply

Hi, I’m looking to scrape mlb baseball player salaries from 2008 until current in MYSQL. I did discover the Lahman database with mlb player salaries, but it’s incomplete. Will the draft kings code above be able to do this?

Thank you in advance.
1. Person
  
  December 1, 2017 at 11:19 AM Reply
  
  Hi Lee. I need some clarification on your question. DraftKings salaries are not the actual salaries that players make but pretend salaries for daily fantasy sports. The Lahman database shows mlb salaries. I’d recommend you start with this. An already created salary dataset http://roadsidephotos.sabr.org/baseball/salaries.zip.
  
  MLB is generally easier than other sports due to the stats junkies that are interested in baseball. Here’s more discussion https://www.baseball-reference.com/about/salary.shtml
  1. Lee
    
    December 16, 2017 at 12:38 AM
    
    Hi Person,
    Ah, thanks for the clarification of the nature of DraftKings salaries and for the link for the already created salary dataset–This is useful. However, I’m looking to get salary information for MLB players for the years from 2008 until present aside from the Lahman database. While I was able to scrape salaries from the Lahman database from 2008 until present, the list is incomplete, as no salary information is included in Lahman’s list for many of the players I have in my database for those years.
    
    I did locate the baseball reference website with salaries as well, but I believe that is the source that Lahman used to get salary information.
    
    I’m also looking for biodata for mlb players from 2008 until present. I have located the “b-height”, and “birth_year” columns in the gameday database, but not weight.
    
    Thank you in advance.
    
    Regards,
    Lee
  2. Person
    
    December 26, 2017 at 6:13 PM
    
    Besides Lahman there is Cot’s Contracts if you didn’t see it yet. http://legacy.baseballprospectus.com/compensation/cots/ but it doesn’t have everything you need.
    
    It won’t be for a while until I get around to baseball. If I ever do find a better source I’ll let you know!
    
    Best of luck
Brian Doucet

September 18, 2018 at 1:43 PM Reply

Good afternoon,

Love this post as I stumbled upon RotoGuru1 a few weeks ago before I started playing DraftKings.

I’m trying to recreate your code in Python (using 3.6) and I’ve never used io or utils so I’m trying it by using requests and BeautifulSoup.

I can scrape the page using the “pre” tags and get the ouput I want but BeautifulSoup is returning a ResultSet instead of a list. Any idea what I could do?

I would love to save this as a list and convert it to a DataFrame for analysis and data cleaning later on.
1. Person
  
  September 18, 2018 at 2:33 PM Reply
  
  Hi Brian
  
  Thanks for reading!
  
  You need to use
  soup.find("pre").text
  The .text turns the ResultSet into text. io is a core library including with Python 3.6 link but you parse it by creating your own csv parser without io and pandas. Just split on lines “\n” and the delimiter “;”.
Anonymous

September 18, 2018 at 3:45 PM Reply

That worked perfectly. I imported Pandas and used the trick to convert the screen block into a file using io.StringIO and saving it as a variable. I passed that variable to pd.read_csv and it worked.

I’m honestly shocked that more people haven’t seen your video. Most of the videos I’ve seen on YouTube that discuss creating Lineup Optimizers and getting DFS data show people copy and pasting it from different sites… that is absolutely nuts. I’m going to be spending some time now practicing how to clean this and add new fields, like a playerID column.

I’m very appreciate for this work you posted and for your response. I’m going to be reading more of your content.
1. Person
  
  September 18, 2018 at 8:04 PM Reply
  
  Thanks! I agree. I’m considering making a course to show people how to do a lot of the data munging and scraping needed for sports data.
  
  Also, take a look at my other blog https://www.blog.sportsdatadirect.com/ if you haven’t seen it yet. I do daily fantasy sport recap articles with a ton of analysis as well as sell sports data.
Anonymous

September 7, 2019 at 1:42 PM Reply

Hi – pretty impressive stuff. I have a quick question. I am using python 3.7 and every time I try to use the utils function I receive the following error.

module ‘utils’ has no attribute ‘soup’

The code I’m trying to write is soup=utils.soup(BASE_URL.replace(“WEEK”,wk).replace(“YEAR”,yr))

Any help would be much appreciated. Thanks.
1. Person
  
  September 8, 2019 at 9:42 AM Reply
  
  Sorry about that, utils is not a standard library. In the video I used some of my own private libraries. You can use the notebook here https://github.com/rogerfitz/tutorials/blob/master/draft-kings-history-scrape/roto-guru.ipynb and make sure you copy the utils.py into the same folder as the notebook (here is the git repo https://github.com/rogerfitz/tutorials/tree/master/draft-kings-history-scrape)
  1. Anonymous
    
    September 11, 2019 at 8:50 AM
    
    Thanks! I’ll give that a try.
Anonymous

September 24, 2019 at 6:23 AM Reply

Hi,

Awesome post! I’m just wondering how often you update the weekly data. Do you publish the salary data (without points) prior to the upcoming week?
1. Person
  
  October 6, 2019 at 3:40 PM Reply
  
  Hi sorry for late reply, I don’t check this blog often but if you email support@sportsdatadirect.com I’m much quicker.
  
  Yes I publish the salary data ahead of time. It is loaded by Wednesday afternoon each week during the NFL regular season.
Anonymous

October 26, 2020 at 3:36 AM Reply

Tried to go to http://sportsdatadirect.com and it says it isn’t secure, both edge and chrome send me to other websites, etc to download extensions, etc.
1. Person ERgo
  
  October 26, 2020 at 1:39 PM Reply
  
  Hi Thanks for the comment. I need to take down those links. I’ve since closed down sportsdatadirect.com and someone else has purchased the domain

Ergo Sum

Thoughts from a person

How to Scrape Historical DraftKings Data in Under 20 Minutes

Scrape Code for DraftKings or FanDuel

15 thoughts on “How to Scrape Historical DraftKings Data in Under 20 Minutes”

Leave a Reply Cancel reply