Data Analysis And Visualization -IPL

4 min readOct 24, 2020

--

In this article, we will learn to explore data using python. This will help us to get a better understanding of the data .

Photo by Alessandro Bogliari on Unsplash

Data Analysis: Data analysis is a process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusions and supporting decision-making.

So for the data analysis , I have taken IPL (Indian Premiere League) data set from kaggle . The data set consists of matches between 2008–2019. So now lets find out what all you can find from this .

Data Preparation and Cleaning

The python libraries I used for data preparation and cleaning are numpy and pandas .

Reading the data:

Now that we have loaded the data into data frame , lets take an overview:

.shape — Returns a Tuple representing the dimensions of the dataframe. Here our dataset consists of 756 rows and 17 columns.

.info — This function is used to print a short summary of the dataframe.

.column — This function is used to get column labels and theses are the 17 columns in our dataset

This dataset, as we can see, contains 17 columns including Id, Season, City, Date, Team1, Team2, Toss Winner, Toss Decision, Result, DL Applied, Winner, Winner by Runs, Winner by Wickets, Player of Match, Venue, Umpire1 and Umpire2 .

Now when you look at the info of dataframe we can see that the data type of date is mentioned as object.So lets change that

Now we have changed the data type of date to datetime, using .to_datetime() method.

Let’s now see how we can split the date into different columns as day ,month,year and weekday.

We can observe that ,four more columns are added to our dataset.

Dropping columns in the dataset

.drop — This function is used to remove or row or column by specifying label names and corresponding axis or by specifying directly index or column names.

Now,Lets look at the dimension of the dataset

We can see here the dimension have changed , earlier we had only 17 columns.

Now, Lets check the teams played

.unique — This method is used to get the unique value in the particular column mentioned.

Here in teams column we can see that Pune’s team has got three names and Delhi’s team has got two names . So we will replace those with proper names.

.replace — This function is used to replace the names.

So now we have replaced the team names with proper names.

Lets Visualize our data set

First ,we will import the necessary libraries for visualization

Now lets see

Which team has won the most number of matches fro 2008 to 2019?

Plotting these values

From the graph its clear that Mumbai Indians have won the most number of times.

Now lets see how it looks in a pie chart

From the pie chart we can easily find which team has won the most number of times.

Number of matches played in each IPL season

Toss decisions

The observation tells that almost 60% of the time toss winner chooses to field first.

Now lets explore more

Number of matches played in each city

In the bar graph we can observe that most number of matches have been played at Mumbai.

Who has been awarded with Player Of the Match most number Of Times.

The table shows as that 226 players have got player of the match award.

So lets plot top five players.

From the above plot we can see that CH Gayle has been the Player of the match for maximum number of times.

Number Of Matches that went normal and tie?

The above results show that how many matches have went normal and tie and also there are matches that went with no results .

Which venue has conducted most number of matches?

So the graph shows that the most number of matches have been conducted at Eden Gardens.

So with that , I have come to an end of data analysis and visualization of IPL dataset. Hope you find this blog interesting and it helps you get an idea of data analysis.

Data Analysis And Visualization -IPL

Data Preparation and Cleaning

Written by Sona Ale Wilson

No responses yet