プロ野球データを簡単にスクレイピング
経緯
ふと、プロ野球の順位予想をしてみたくなり、
せっかくなら順位も選手成績データを分析して順位予想してやろうと。
それでは簡単にコードから
今回は以下のURLをPythonで指定してやると、簡単に動きます。
便利な時代ですね。
ちなみに今回はパリーグの投手成績・野手成績・チーム成績をスクレイピングします。
import pandas as pd base_url = "https://baseball-data.com/" years = [10,11,12,13,14,15,16,17,18,19] players = ['pitcher', 'hitter'] #months = [3,4,5,6,7,8,9,10,11] teams = ['l','h','e','m','f','bs']
df = None for player in players: for year in years: for team in teams: url = base_url + str(year) + '/stats/' + player + '-' + team +'/' print(url) try: dfs = pd.read_html(url) df_temp = dfs[0] df_temp['team'] = team df_temp['year'] = year except Exception as e: print(e) continue if df is None: df = df_temp else: df = pd.concat([df,df_temp]) df.to_csv(player+'.csv')
df = None data = ['standings'] for d in data: for year in years: url = base_url + str(year) + '/team/pa.html' print(url) try: dfs = pd.io.html.read_html(url) df_temp = dfs[0] df_temp['year'] = year except Exception as e: print(e) continue if df is None: df = df_temp else: df = pd.concat([df,df_temp]) df.to_csv(d+'-team.csv')
次回からは近年のチーム成績と総合投手力・野手力を加味していきたいと思います。