Okay, so today I decided to mess around with the South Africa vs. Zimbabwe cricket match. I’ve been wanting to do some data analysis on cricket for a while, and this seemed like a good opportunity.

First, I started by looking for a place to get the data. I needed ball-by-ball information, you know, like who bowled, who batted, how many runs, wickets, all that stuff. I spent a good hour just searching for reliable sources. Finally I found some, and I managed to scrape the data I need.
Wrangling the Data
Then came the fun part – NOT! Actually, it was kind of a headache. I had to clean up the data because it was all over the place. Different formats, missing entries, you name it. I used my old friend, Python and a package to get it all sorted.
- Import the packages I need.
- Read and write the data.
- Clean the data.
I wrote a bunch of scripts to organize everything. It took me a while, I must admit, to fix the data I got and try to figure out how to analyze, and check the missing data.
Getting to the Analysis
After I finally got the data cleaned up, I could actually start analyzing it. I wanted to see things like:
- Run rates in different phases of the game.
- How different batsmen performed against different bowlers.
- Key moments that might have shifted the momentum.
I played around with different calculations. I calculated the run rates for the powerplay, middle overs, and death overs for both teams. I also looked at partnerships, individual scores, and how frequently boundaries were scored.

I did not finish all the plans, I have some work left. But from what I analyzed so far, it’s pretty interesting to see the actual numbers confirm some of the things we usually just assume when watching a match. For example, I really confirmed that run rates are actually lower during the middle overs. Also how some players really struggle against certain types of bowling. And those moments is true.
It’s still a work in progress, but I’m pretty happy with how it’s turning out. It’s one thing to watch the game, and another to dig into the numbers behind it. I hope can get a clearer view next time!