** **

# Introduction

We all love a good game. Like all other games, the game of Cricket is about skill and preparation of individual players, coordination and teamwork of the whole team, evolving tactics and overall strategy adopted by each team in each match, and of course some luck. It is hard to predict what will happen in the next ball, the next over, the next inning, or in next match. This inability to predict and the uncertainty of the outcome is what makes these games fun. But are these games really completely random? Is there some predictability to these games? Don’t we as cricket lovers and knowledgeable about all the past games make a prediction about which team might win even before the start of the game? Let’s see if an AI system can do the same!

**What if an AI system that has “watched” all the previous games can actually predict with ****some confidence who will win the next match given mainly the team composition of the ****two teams and some context of the match.**

This challenge is about building such a prediction system. It is also a challenge on how you think from first principles about the art of iteratively engineering features and building models – the core to any AI system building.

**The Data**

You are given the following three data files for this challenge.

**DELIVERIES–** data contains a ball-by-ball log of past 500 IPL matches. Each row corresponds to each ball delivered in each over of each inning of each match along with (a) the metadata associated with the ball (e.g. batsman, bowler, etc.) and (b) the outcome of the ball (e.g. runs, extras, dismissal etc.). (See Appendix A for details).

**MATCHES–** for each of those 500 matches in the DELIVERIES data, this file contains (a) additional meta-data about the match (e.g. city and stadium), (b) who won that match, and (c) by what margin (e.g. a number of runs/wickets). (See Appendix B for details)

**PREDICT–** this is the test data where you are given a set of 136 matches along with the team and composition of each of the two teams. (See Appendix C for details).

**The Challenge **

Your challenge is to build a model given the DELIVERIES and MATCHES dataset to predict the winning team in any given match. The input to the model is the set of players of the two teams and the output is a probability that the first of the two teams will win this match (SeeAppendix D for a sample output file). Let’s see how well you can predict the winning team in a cricket match. DELIVERIES The deliveries file contains the ball-by-ball log of what happened in each delivery in 500 past IPL matches over the last ten years. It is a CSV file with the following columns:

**Appendix A**

**MATCHES **

For each of the 500 matches, this file gives some meta-data and outcome of the match.

**Appendix B**

**PREDICT**

For the 136 matches, you are given the following meta-data.

**Appendix C**

**Data Set – Download Data Set**

**Submission **

Your goal is to predict whether team1 will win or lose this game. Your output should be a two-column CSV file with the following columns:

**Appendix D**

**Evaluation Metric**

Your score will be a percentage of team wins that your machine learning model correctly classifies. This metric is known as accuracy.

**Accuracy = (Total number of correct responses / Total number of responses) * 100**

**Final Submission**

**You can use any ML library and environment to build the models and generate the final predictions. You need to submit the following to be considered for the Challenge:**

**OUTPUT file as described in Appendix D****List of features you used in your model – a reasonable description of each feature****A brief description of the model you used and the hyper-parameters of this model****Your code for both generating features, training models, and scoring the test data**