Test Cricket Statistics

Test Cricket Statistics: Data Preparation

Test Cricket Statistics - The test matches are long format of cricket. Test Match is played for 5 days. Each team gets 2 innings.Every match can have 4 inning max, however some matches are finished in 3 innings. Under this project the data is loaded from three different data sources

Yaml files:

Yaml files are extracted from zip file. Each yaml file represent one match. There are max four innings data available in each file. Every file contains information about every ball. Along with every ball stats, this file contains following data

Data preparation steps

Website data:

The match is searched on this website by using start date, team1 and team2 from Yaml file. Match Statistics is donwloaded from http://www.howstat.com/ by web scrapping. This data contains statistics for each player for that match and innings totals, etc. The player names are matched by using fuzzy matching as names in Yaml and website have some differences.

#### Data preparation steps

API:

The player career statistics is donwloaded from http://cricapi.com using API. Player profiles are found in following order

  1. Tried to find player by exact match first
  2. If not found in above step, searched with last-name
  3. Used Fuzzy Matching for name matching if not found with exact match

Data preparation steps

Stats and Plots

Reference:

http://www.howstat.com/

http://cricapi.com