The 2019-2020 NBA season recently came to a close, as the Los Angeles Lakers defeated the Miami Heat 4-2 in the NBA finals. This past season was a wild ride, with the global COVID-19 pandemic leading to a several month gap in the middle of the NBA season. After months of no NBA basketball and much planning on how to safely continue the season, the NBA and the NBA Player's Association decided on hosting the rest of the season at the NBA "bubble", located at Walt Disney World in Orlando, Florida.
The bubble was home to many ridiculous and unpredictable performances, including the Phoenix Suns going undefeated in their remaining 8 regular season games, Jamal Murray and Donovan Mitchell scoring 50+ points against each other in the playoff matchup between the Utah Jazz and Denver Nuggets, and TJ Warren having stat lines reminiscent of prime Michael Jordan. Possibly most notable, however, was the rise of Miami Heat guard Tyler Herro, whose performance throughout the Heat's improbable Finals run sent shockwaves throughout the NBA world. Herro was even featured in the music video for the song titled "Tyler Herro" by recent rap sensation Jack Harlow a few months after his memorable playoff run.
While Herro was for sure the rookie with the biggest impact in the NBA bubble, he was drafted just 13th overall by the Miami Heat. Clearly, Herro was a great draft pick by the Heat, when much less impactful players like R.J Barrett, third overall pick by the New York Knicks, were drafted before him. The Miami Heat’s success story with Tyler Herro, and even Bam Adebayo (14th overall pick in the 2017 NBA draft), begs the question: which NBA teams have drafted best in recent years?
Data Gathering
To begin we imported data from a csv file that contained season data for every NBA player from the 1996 season to the 2019 season. Our data contained many important stats about the player (name, team, age, draft year, draft round, draft pick number) and also important stats about their performance for each individual season like points per game, rebounds per game, assists per game, etc. Once we had this data we knew that we could begin our data tidying and exploratory data analysis to figure out which teams have been successful at drafting talented college hoopers.
import pandas as pd
import numpy as np
import re
import statsmodels.api as sm
import matplotlib as plt
import matplotlib.pyplot as pyplt
import requests
!pip install folium
import folium
from folium import IFrame
import base64
import seaborn as sns
from bs4 import BeautifulSoup
!{sys.executable} -m pip install html5lib
import sys
!{sys.executable} -m pip install lxml
sns.set(rc={'figure.figsize':(12,10)})
file = 'all_seasons.csv'
all_data = pd.read_csv(file)
all_data.sample(10)
Here we’re just displaying the columns of our data.
all_data.columns
Data Tidying
Once we had our data we were set to begin our data tidying. The first thing we decided to do was filter our data to only include seasons starting with the 2005 NBA season. Many basketball fans and critics have pointed to how the NBA game has changed drastically over the years. They claim the league has become “softer” and more offense-oriented. So to avoid skewing our data we only included the last 15 years of basketball. To figure out which teams have made the best use of their first round picks we also filtered our data to only include first-round selections.
# filtering for only first round selections
is_players_drafted_in_first_round = all_data['draft_round'] == '1'
players_drafted_in_first_round = all_data[is_players_drafted_in_first_round]
players_drafted_in_first_round = players_drafted_in_first_round.astype({'draft_year': 'int32'})
#filtering for years after 2005
is_players_drafted_after_2005 = players_drafted_in_first_round['draft_year'] >= 2005
players_drafted_round_one_and_after_2005 = players_drafted_in_first_round[is_players_drafted_after_2005]
players_drafted_round_one_and_after_2005
The second step of our data tidying process was to only include seasons where the player played over 20 games. This allows us to only keep the significant seasons and also not give weight to seasons where the player may have been injured. For seasons under 20 games, we also anticipated that such a small sample size could skew the data. For example, if a player suffered a season-ending injury in their first game and only scored two points then we don’t want to weigh that season equally to a season where the player played a full 82 game regular season.
#drop seasons under 20 games
is_over_20_games = players_drafted_round_one_and_after_2005['gp'] > 20
data_to_be_used = players_drafted_round_one_and_after_2005[is_over_20_games]
data_to_be_used
Now the dataframe we made has every season for a first round draft pick where the player played over 20 games starting in 2005. From here we simply aggregated over player names to determine each player's career averages for points, rebounds, and assists, the three most widely looked at statistics for basketball.
#aggregating points, rebounds, and assits for players
players = data_to_be_used.groupby('player_name', as_index = False).agg({'pts': 'mean', 'reb': 'mean', 'ast': 'mean'})
players
In the NBA players are constantly switching teams through free agency (see some of the biggest free agency signings here). To help us match the player with their draft team we needed to create a dataframe that only contains a player name with their draft information. This would allow us to later group players based on their draft team and figure out which teams have been the most successful at drafting.
To start this process we had to do some more data tidying. The first step was to adjust the season column to only include the start year of the season by using some regular expressions.
#adjusting the season to only have the starting year using regex
regex = "(?P<start_year>\d+)-\d\d"
for idx,row in players_drafted_round_one_and_after_2005.iterrows():
reg = re.search(regex, row['season'])
players_drafted_round_one_and_after_2005.at[idx, 'season'] = reg.group('start_year')
players_drafted_round_one_and_after_2005
Now we filtered our dataframe to only include rows where the draft year is equal to the season year. By getting players rookie seasons, we are to see exactly which teams drafted them and what pick they were selected at, which is exactly what we need.
#now adjusting the dataframe to only have seasons with draft year = start year
players_drafted_round_one_and_after_2005 = players_drafted_round_one_and_after_2005.astype({'season': 'int32'})
players_drafted_round_one_and_after_2005 = players_drafted_round_one_and_after_2005.astype({'draft_year': 'int32'})
filter_for_draft_year = players_drafted_round_one_and_after_2005['draft_year'] == players_drafted_round_one_and_after_2005['season']
players_rookies_years = players_drafted_round_one_and_after_2005[filter_for_draft_year]
players_rookies_years
Now we only select the columns we need that correspond to their draft info. We already have players' career average and aren’t really interested in how they performed in their rookie year. So we drop any columns that didn’t correspond to a player’s draft year/position.
#making data frame with players' draft info
players_and_draft_teams = players_rookies_years[['player_name', 'team_abbreviation', 'draft_round', 'draft_number', 'draft_year']]
players_and_draft_teams
By doing a merge with the earlier dataframe we created a dataframe that contains a players draft info and their career averages. From here we’ll be able to tell which players were successful draft picks and which teams were especially good at selecting them.
#merging data frames to include draft info and players averages
players_career_data = pd.merge(players_and_draft_teams, players, on = 'player_name')
players_career_data
We needed a methodology for determining successful NBA players. What we decided to do was find the sum of the players’ average points, rebounds, and assists. Since these are the most common statistics, looking at the sum is a good measure of how productive a player has been. It essentially is a measure of how much players contribute to a game and fill out the stat sheet.
players_with_scores = players_career_data
#establishing scores for players by iterating through rows and adding points, rebounds, assists
for idx,row in players_with_scores.iterrows():
players_with_scores.at[idx, 'score'] = row['pts'] + row['reb'] + row['ast']
players_with_scores
Now we were ready to aggregate over the team name and find the average score (reminder: the sum of points, rebounds, and assists) and draft position for each team. From here we planned to plot the data to see exactly which teams have made good use of their picks based on the position they typically draft at. We did have to do some more data tidying here, as well. Since many franchises have changed names or relocated, we updated our data to be consistent with the current team names (for example changing the New Jersey Nets to the Brooklyn Nets).
#tidy data to reflect current team names
for idx,row in players_with_scores.iterrows():
if row['team_abbreviation'] == 'SEA':
players_with_scores.at[idx, 'team_abbreviation'] = 'OKC'
elif row['team_abbreviation'] == 'NJN':
players_with_scores.at[idx, 'team_abbreviation'] = 'BKN'
elif row['team_abbreviation'] == 'NOH' or row['team_abbreviation'] == 'NOK':
players_with_scores.at[idx, 'team_abbreviation'] = 'NOP'
players_with_scores = players_with_scores.astype({'score': 'int32'})
players_with_scores = players_with_scores.astype({'draft_number': 'int32'})
#grouping by team names to find the averages by team to plot
teams_avg_draft_spot_and_avg_score = players_with_scores.groupby('team_abbreviation', as_index = False).agg({'score': 'mean', 'draft_number': 'mean'})
teams_avg_draft_spot_and_avg_score
EDA and Visualization
Now we plotted the data on a scatter plot with the teams average draft position on the x-axis and the team’s average score on the y-axis. We anticipated that we would see a downward sloping trend. Higher draft picks are expected to be more talented, so we anticipated that teams who pick earlier would on average draft players with higher scores. This was the exact trend we noticed when we made our plot. Another point that was specifically surprising was the Los Angeles Lakers. For a team with a draft pick position that seemed about average, their score was significantly higher than any other team! Clearly the Lakers had been the best team in finding talented prospects in the first round. However, this makes sense, since just this past summer the Lakers traded a plethora of young talent to the Pelicans to acquire Anthony Davis, a pivotal component of their championship nucleus. Without drafting well in recent years, they would not have been able to put together a good enough trade package for a superstar like Anthony Davis. We also did some visualization here to plot each team's point in their team color and include a label of the team abbreviation to clearly see where each team stands.
We also included a linear regression model on our plot. This model allows us to tell which teams have done well relative to the other teams in the league. If teams are above the model (positive residual) then they have drafted better than we would anticipate with the model. On the other hand, teams below our model (negative residual) would have drafted worse than we would anticipate.
plot = teams_avg_draft_spot_and_avg_score.plot.scatter(x = 'draft_number', y = 'score', figsize = (12,10))
plot
#creating regression line for our plot
line = np.polyfit(teams_avg_draft_spot_and_avg_score['draft_number'],
teams_avg_draft_spot_and_avg_score['score'], 1)
x = np.poly1d(line)
teams_avg_draft_spot_and_avg_score.insert(3, 'regression',
x(teams_avg_draft_spot_and_avg_score['draft_number']))
teams_avg_draft_spot_and_avg_score.plot(x = 'draft_number', y = 'regression', color = 'Red', ax = plot)
print("score = " + str(line[0]) + " * draft_position" + "+ " + str(line[1]))
#making lists of points to be plotting along with team abbreviation and team color
x = teams_avg_draft_spot_and_avg_score['draft_number']
y = teams_avg_draft_spot_and_avg_score['score']
names = teams_avg_draft_spot_and_avg_score['team_abbreviation']
teams_colors_in_hex = ['#E03A3E', '#000000', '#007A33', '#1D1160', '#CE1141',
'#860038', '#00538C', '#0E2240', '#C8102E', '#1D428A',
'#CE1141', '#002D62', '#C8102E', '#552583', '#5D76A9',
'#DB3EB1', '#00471B', '#0C2340', '#85714D', '#F58426',
'#007AC1', '#C4CED4', '#006BB6', '#1D1160', '#E03A3E',
'#5A2D81', '#C4CED4', '#CE1141', '#002B5C', '#002B5C']
for i, txt in enumerate(names):
#adding label to points
plot.annotate(txt, (x[i] + .15, y[i] + .01))
#plotting point in teams color
plot.scatter(x[i], y[i], color = teams_colors_in_hex[i])
More Visualization and Machine Learning
To get a better understanding of where teams stand compared to other teams, we decided to plot the teams residuals on a map of the US, where the team is located. If they performed below what our model would predict then we would plot a shade of red, while if they performed better than what our model would predict, we would plot a shade of green. The bolder the shade, the better or worse a team performed.
To get our data of where teams are located we scraped data from the web. This allowed us to get the longitude and latitude of team’s stadiums so we can plot it on a map
#scraping data for team locations
r = requests.get('http://lionstobakersfield.blogspot.com/2019/01/nba-arenas-with-latitude-and-longitude.html',
headers = {"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0"})
root = BeautifulSoup(r.content, 'html')
table = root.find("table").prettify()
team_stadium_loc_frame = pd.read_html(table)[0]
We had to do some data tidying so we were able to join the longitude and latitude columns with the residuals from our model. We also had to drop the formatting of the longitude and latitude to just keep them as floats.
columns = ['team_abbreviation', 'arena_name', 'latitude', 'longitude']
team_stadium_loc_frame.columns = columns
abbrevs = ['ATL', 'BOS', 'BKN', 'CHA', 'CHI', 'CLE', 'DAL', 'DEN', 'DET', 'GSW', 'HOU', 'IND', 'LAC', 'LAL', 'MEM',
'MIA', 'MIL', 'MIN', 'NOP', 'NYK', 'OKC', 'ORL', 'PHI', 'PHX', 'POR', 'SAC', 'SAS', 'TOR', 'UTA', 'WAS']
team_stadium_loc_frame['team_abbreviation'] = abbrevs
#tidying longitude and laditude from website
regex = '\d+.\d+'
for idx, row in team_stadium_loc_frame.iterrows():
lat_reg = re.search(regex, row['latitude'])
team_stadium_loc_frame.at[idx, 'latitude'] = lat_reg.group(0)
long_reg = re.search(regex, row['longitude'])
#All longitudes are west, so change to negative
team_stadium_loc_frame.at[idx, 'longitude'] = '-' + long_reg.group(0)
team_stadium_loc_frame
Now that we were able to get our dataframe with the stadium locations, we had to get a column with the residuals from our model. We used statsmodels OLS to get the residuals along with the team abbreviation and stadium location.
#Using OLS to make model of teams draft position and score
X = teams_avg_draft_spot_and_avg_score['draft_number']
Y = teams_avg_draft_spot_and_avg_score['score']
X = sm.add_constant(X)
model = sm.OLS(Y,X)
results = model.fit()
resids = results.resid
#Adding the residuals from our model to the dataframe to see how well teams performed
#compared to what our model predicted.
teams_avg_draft_spot_and_avg_score['residual'] = resids
teams_residual_and_location = pd.merge(teams_avg_draft_spot_and_avg_score,team_stadium_loc_frame, on = 'team_abbreviation' )
teams_residual_and_location
We were set to plot all our data on a map using folium. For teams that drafted well above what our model predicted (a high positive residual), we made their points stronger shades of green. Teams that drafted worse than our model predicted (negative residual) we plotted in red, with the point being bolder as the residual becomes more negative. We also wanted to plot the teams exact residual in a pop up for each team's point and also include their team logo. So if you click one of the points on the map below, you’ll see a popup with the teams residual from our model and their logo.
#now want to add icon row for each
for i, row in teams_residual_and_location.iterrows():
teams_residual_and_location.at[i, 'logo'] = "nba_logos/" + str(row['team_abbreviation']) + "logo.png"
teams_residual_and_location = teams_residual_and_location.astype({'latitude': float, 'longitude': float})
team_map = folium.Map(location=[39.8283, -98.5795], zoom_start=4)
#Adjusting for the Clippers since they share a stadium with the Lakers and want to
#be able to see both points
teams_residual_and_location.at[12, 'longitude'] = teams_residual_and_location.at[12, 'longitude'] + 0.4
for i, row in teams_residual_and_location.iterrows():
#encoding image to be used in popup
encoded = base64.b64encode(open(row['logo'], 'rb').read())
team_name = row['team_abbreviation']
val = float('%.3f'%(row['residual']))
html_string = "<p>" + str(team_name) + ": " + str(val) + "</p> <img src='data:image/png;base64,{}''>"
#making popup to include team name and residual, along with picture of logo
html = html_string.format
iframe = IFrame(html(encoded.decode()), width=80, height=115)
popup = folium.Popup(iframe, max_width=1000)
#Here we establish our gradient for the point to be plotted on map to either be a shade
#of green or a shade of red
if row['residual'] >= 2:
folium.CircleMarker(location=[row['latitude'], row['longitude']], radius=5,
color="#2EB62C", fill=True, fill_color="#2EB62C", popup = popup).add_to(team_map)
elif row['residual'] >= 1.5:
folium.CircleMarker(location=[row['latitude'], row['longitude']], radius=5,
color="#57C84D", fill=True, fill_color="#57C84D", popup = popup).add_to(team_map)
elif row['residual'] >= 1:
folium.CircleMarker(location=[row['latitude'], row['longitude']], radius=5,
color="#83D475", fill=True, fill_color="#83D475", popup = popup).add_to(team_map)
elif row['residual'] >= .5:
folium.CircleMarker(location=[row['latitude'], row['longitude']], radius=5,
color="#ABE098", fill=True, fill_color="#ABE098", popup = popup).add_to(team_map)
elif row['residual'] >= 0:
folium.CircleMarker(location=[row['latitude'], row['longitude']], radius=5,
color="#C5E8B7", fill=True, fill_color="#C5E8B7", popup = popup).add_to(team_map)
elif row['residual'] <= -2:
folium.CircleMarker(location=[row['latitude'], row['longitude']], radius=5,
color="#DC1C13", fill=True, fill_color="#DC1C13", popup = popup).add_to(team_map)
elif row['residual'] <= -1.5:
folium.CircleMarker(location=[row['latitude'], row['longitude']], radius=5,
color="#EA4C46", fill=True, fill_color="#EA4C46", popup = popup).add_to(team_map)
elif row['residual'] <= -1:
folium.CircleMarker(location=[row['latitude'], row['longitude']], radius=5,
color="#F07470", fill=True, fill_color="#F07470", popup = popup).add_to(team_map)
elif row['residual'] <= -.5:
folium.CircleMarker(location=[row['latitude'], row['longitude']], radius=5,
color="#F1959B", fill=True, fill_color="#F1959B", popup = popup).add_to(team_map)
elif row['residual'] <= 0:
folium.CircleMarker(location=[row['latitude'], row['longitude']], radius=5,
color="#F6BDC0", fill=True, fill_color="#F6BDC0", popup = popup).add_to(team_map)
team_map
Looking at First Round Draft Picks Since 2005
After our visualization, we became curious about whether there were any particular draft classes that had a substantial amount of talented basketball players. So we made a violin plot of players' scores with their draft year. What we found was that most draft years actually produce around the same amount of talented players. The violin plots over the years didn’t seem to change much, however some years definitely produced some extremely talented prospects, like in 2018 where we see a high peak.
sns.violinplot(data = players_with_scores, x = "draft_year", y = "score")
Looking at Second Round Draft Picks Since 2005
What we have just looked at up until this point was first round picks and which teams have had the most success at consistently drafting productive players with their first round picks. However, the NBA draft does not just have one single round. The NBA draft has 2 rounds, and, as long as a team does not trade their picks, every team has one first round pick and one second round pick to help bolster their roster.
We can look at the same sort of violin plot as we just saw with first round picks, but now with second round picks since 2005 and compare the results of the two. However, first, we have to do the same data cleansing that we did with the first round picks with the second round picks to acquire the data necessary for the violin plot.
non_first_rounders = all_data[all_data['draft_round'] == '2']
non_first_rounders = non_first_rounders.astype({'draft_year': 'int32'})
non_first_rounders_since_2005 = non_first_rounders[non_first_rounders['draft_year'] >= 2005]
# drop seasons under 20 games
over_20_games = non_first_rounders_since_2005['gp'] > 20
second_round_draft_picks_seasons_over_20_games = non_first_rounders_since_2005[over_20_games]
second_round_draft_picks_seasons_over_20_games
#aggregating for points, rebounds, assists
players = second_round_draft_picks_seasons_over_20_games.groupby(['player_name', 'draft_year'], as_index = False).agg({'pts': 'mean', 'reb': 'mean', 'ast': 'mean'})
players
#summing points, rebounds, and assists for total score
players_with_scores_2nd_round = players
for idx,row in players_with_scores_2nd_round.iterrows():
players_with_scores_2nd_round.at[idx, 'score'] = row['pts'] + row['reb'] + row['ast']
players_with_scores_2nd_round
Just like how we saw that the average productivity of first round picks diminishes as the average draft pick of a team increases (later in the draft) in the regression line from above, we can see this same type of trend when comparing the violin plot of first round draft picks by year and second round draft picks by year. The average “score” -- which is the sum of a player’s career average of points, rebounds, and assists -- for second round picks tends to be about 8 - 10 depending on the draft class, while it is about 10 - 15 for first round picks. It makes sense that on average first round picks are more productive than second round picks, but there is one thing that stands out in this violin plot of second round picks -- the outliers. There are many extremely high peaks, some even higher than that of the first round picks (see 2014). While each draft class tends to be skewed downwards (most second round picks are not very productive), there are very clearly certain second round picks who have turned out to be very productive and successful over the course of their NBA career. Now, we want to try to take a look at who these players are and which teams have had the most success since 2005 in finding these "hidden gems" in the second round.
sns.violinplot(data = players_with_scores_2nd_round, x = "draft_year", y = "score")
Hidden Gems
#adjusting the season to only have the starting year
regex = "(?P<start_year>\d+)-\d\d"
for idx, row in non_first_rounders_since_2005.iterrows():
reg = re.search(regex, row['season'])
non_first_rounders_since_2005.at[idx, 'season'] = reg.group('start_year')
non_first_rounders_since_2005
Once again, we have to take into account the fact that players can change teams during their career (either via trade or free agency). To account for this, we have to look at the rookie season for each second round pick in the data frame and see which team each player played for during their rookie season. After doing this, we can join that data frame with the data frame used in the violin plot to be able to see the career stats of each player (only taking into account seasons where they played over 20 games to avoid skew), along with the team that drafted them.
#now adjusting the dataframe to only have seasons with draft year = start year
non_first_rounders_since_2005 = non_first_rounders_since_2005.astype({'season': 'int32'})
non_first_rounders_since_2005 = non_first_rounders_since_2005.astype({'draft_year': 'int32'})
filter_for_draft_year = non_first_rounders_since_2005['draft_year'] == non_first_rounders_since_2005['season']
rookie_seasons_2nd_round = non_first_rounders_since_2005[filter_for_draft_year]
rookie_seasons_2nd_round
players_and_draft_teams_2nd_round = rookie_seasons_2nd_round[['player_name', 'team_abbreviation', 'draft_round', 'draft_number', 'draft_year']]
players_and_draft_teams_2nd_round
#merging dataframe to have draft info with players scores
players_career_data_2nd_round = pd.merge(players_and_draft_teams_2nd_round, players_with_scores_2nd_round, on = 'player_name')
players_career_data_2nd_round
Next, we had to determine what “score” determines a productive NBA player, and therefore someone who would be a very successful draft pick for a team in the second round. We took into account that good career averages for a successful NBA guard would be about 12 points, 3 to 4 assists, and a rebound or two. For a big man, a successful career would be about 10 points, 6 rebounds, and maybe an assist or two. Therefore, it seemed logical to deem a “hidden gem” as someone who averaged a total of 17 or more points, rebounds, and assists throughout their career. Using this metric to filter out all players below the threshold, we were able to come up with a list of 20 players drafted since 2005 who can be categorized as a “hidden gem.” Many of these names are very notable and would definitely be considered a hidden gem as a result of being drafted in the second round, including former all stars Paul Millsap, Khris Middleton, and Draymond Green, and 3 time sixth man of the year winner Lou Williams.
#filtering dataframe to only look at "hidden gems"
hidden_gems = players_career_data_2nd_round[players_career_data_2nd_round['score'] > 17]
hidden_gems
Looking at the years in which these hidden gems were drafted, we can make some connections to the previous violin plot. For example, we can see that 2009 and 2013 do not have a notable outlier with a high “score” in the violin plot, and we can also see that neither of these years had a hidden gem come out of them.
Now, we can organize the hidden gems by putting them into a new data frame with every NBA team along with a list of the hidden gems (if any) that they have found since 2005. We can see that many franchises that have been successful in recent years have been able to find one or more hidden gems, including the Milwaukee Bucks, Miami Heat, Houston Rockets, and Golden State Warriors.
#concatenating hidden gems onto teams that drafted them
hidden_gems_by_team = pd.DataFrame(columns = ['team_abbreviation'], data=abbrevs)
hidden_gems_by_team['hidden_gems_found'] = [''] * 30
hidden_gems_by_team = hidden_gems_by_team.set_index('team_abbreviation')
for idx, row in hidden_gems.iterrows():
if hidden_gems_by_team.at[row['team_abbreviation'], 'hidden_gems_found'] == '':
hidden_gems_by_team.at[row['team_abbreviation'], 'hidden_gems_found'] = row['player_name']
else:
to_concat = ', ' + row['player_name']
hidden_gems_by_team.at[row['team_abbreviation'], 'hidden_gems_found'] += to_concat
hidden_gems_by_team
Lastly, using this same threshold of a career average of 17 total points, rebounds, and assists, we can compare the percentage of first round picks that have been successful to the percentage of second round picks that have been successful.
#finding ratios of successful first round picks
all_first_rounders = players_drafted_round_one_and_after_2005.groupby('player_name', as_index = False).size()
num_first_rounders = len(all_first_rounders.index)
hits_on_first_rounders = players_with_scores[players_with_scores['score'] > 17]
num_hits_on_first_rounders = len(hits_on_first_rounders.index)
#finding ratios of successful second round picks
all_second_rounders = non_first_rounders_since_2005.groupby('player_name', as_index = False).size()
num_second_rounders = len(all_second_rounders.index)
hits_on_second_rounders = hidden_gems
num_hits_on_second_rounders = len(hits_on_second_rounders)
#plotting ratios above in pie chart
y = np.array([num_hits_on_first_rounders, num_first_rounders - num_hits_on_first_rounders])
mylabels = ['Hits on First Rounders', 'Misses on First Rounders']
mycolors = ['#FFD700', '#808080']
myexplode = [0.1, 0]
pyplt.pie(y, labels = mylabels, colors = mycolors, explode = myexplode, shadow = True, autopct='%.1f%%', radius = 1)
pyplt.show()
#plotting second round ratios in pie chart
y = np.array([num_hits_on_second_rounders, num_second_rounders - num_hits_on_second_rounders])
mylabels = ['Hits on Second Rounders', 'Misses on Second Rounders']
mycolors = ['#FFD700', '#808080']
myexplode = [0.1, 0]
pyplt.pie(y, labels = mylabels, colors = mycolors, explode = myexplode, shadow = True, autopct='%.1f%%', radius = 1)
pyplt.show()
We can clearly see that first rounders are much more likely to end up being productive NBA players than second rounders, at a rate of 27.8% compared to 5.9%. However, 27.8% still is not a very high probability, so it is very important for teams to scout prospects well and make good decisions on draft day. The wrong choice could end up with a team selecting a player like Greg Oden over a player like Kevin Durant, a choice that sets the direction of a franchise, for better or for worse, for many years to come.
Conclusion
The draft is the most consistent way to improve an NBA team, as each team has a first round pick and a second round pick every year. While free agency is another way to alter the direction of a franchise, not every team can be as fortunate as the Los Angeles Lakers and sign LeBron James. There are certain franchises that struggle mightily to attract talented free agents. Therefore, drafting well, at whatever pick in the draft, is extremely important. While the Lakers did sign LeBron James in 2018, we have also seen the great amount of success they have had in drafting first round talent since 2005. These successful draft picks allowed them to land superstar Anthony Davis in a blockbuster trade, paving the way for their 2020 NBA championship run. However, finding hidden gems in the second round is just as important. Although the Warriors have not had great success at drafting consistently well in the first round, as seen with their negative residual on the regression line, they have found the most hidden gems of all teams since 2005, helping pave the way for their 5 NBA championship appearances and 3 NBA titles in the past 6 years.