Predicting First Innings Score of an IPL Match by Statistical Learning Method

Abstract

The Indian Premier League (IPL) is a franchisee-based cricket tournament. In most of the matches of IPL, a constant duel between the batsmen and bowlers is observed. As a batting unit, it becomes the responsibility for any team batting first, to score as many runs, they can so that they can stay ahead of the opponent team. Thus, it is of utmost importance to get formidable first innings total to win a match in IPL. An attempt has been made to predict the final first innings score of a team at different stages of a match. A statistical learning technique has been applied to predict this score. This study primarily focuses on implementing a novel weighted system for wickets lost, so that the prediction can be achieved with minimum error.

Introduction

As a modern-day cricket tournament, IPL has always been a breeding ground for Data Science researchers and for people from various other technical backgrounds. The objective of this paper is to build predictive models for predicting the final first innings score of any IPL match based on the score and the loss of a wicket in a particular stage of a match. A Linear Regression technique has been used to build the model. For each and every team, the data for IPL season 11 and 12 played in the year 2018 and 2019 respectively has been considered as the training data. IPL season 13, played in 2020 has been considered as the test data where significant effort was put in to validate the model. In the current work a new concept viz. wicket weights are introduced and used in the machine learning algorithm to make the model more precise and accurate.

Development of Model and Wicket-Weights

Statistical measurements have always been a key part of any analysis. Graphical and mathematical measurements are simultaneously used to gain an insight into the raw data. In this work, harmonic mean, a different but popular form of the measure of average has been used. At the time of the computation of wicket weight, two metrics are calculated based on runs scored and balls faced by the batsmen, in a partnership of a particular wicket. There are a lot of factors that can influence the scoring rate of a particular team, like the number of wickets in hand, required run rate etc. So, there is an interdependence between the runs scored and a lot of additional factors. Statistically called the Dependent Feature, namely Runs Scored which depends on some Independent Features like Fall of Wickets, Required Run Rate etc. and one of the most useful tools to represent such interdependency in the mathematical structure to get a meaningful insight is Multiple Linear Regression.

While building the desired models, the Sanity Testing is done by taking a small portion of the training data, which is thereafter tested with Linear Regression along with Box-Cox transformation and Tree-based, as well as non-linear approaches. It was found that in that Sanity Testing, linear regression performed extremely well, whereas non-linear methods did not perform very well. Also, the linear regression with Box-Cox transformation produced some unstable predictions. Thus, Linear Regression has been chosen as our Predictive model for this study.

The computed Wickets Weights of different positions are showing in the table below.

Validation and Outcomes

In any statistical measurement, there is a departure of the computed results from the actual results. Taking differences between the actual and the computed results always indicate a measurement of efficacy. If the square of the differences is considered, these indicate the absolute valued measurement of error between the observed and the computed quantity. The mean of these squared errors represents a centring constant for the squared errors termed as Mean Squared Error (MSE). A lower value of MSE clearly signifies the goodness of the procedure and hence it would be of utmost interest in any study to minimize the value of MSE.

Different plots are shown below representing the value of Average Mean Squared Error (MSE) of actual run scored and predicted run scored on the basis of the built-in models for each and every over for all the teams in IPL season 13. For different IPL teams, first, the predictive models are built for predicting the actual score based on IPL seasons 11 and 12. After that, the final scores are predicted for the IPL season 13 based on the different models for every over.






Conclusion

By using the regression technique, a simplistic algorithm was performed which gave more efficiency in the predictions. The selection of wicket weights was the key to this work.

  • Based on the facts observed from the plots of MSE with respect to over number, a suggestive guideline to select the players of a team during some fantasy games can be designed. Therefore, a user can choose the players based on the overall performance of the team in the different phases of the game observed. Like, for KKR, users can choose batsmen of lower-middle order more and more rather than their openers. For SRH, their performance has a level shift after the 5th over, so we can pick their openers preferably over middle-order batsmen.
  • Different IPL teams can make their IPL auction strategy as well as match strategy to pick the key players who can stabilize their overall run-scoring ability. Like, RR can buy players from the auction who can make their batting stronger in the last 10 overs.





Comments