Copyright is owned by the Author of the thesis.  Permission is given for 
a copy to be downloaded by an individual for the purpose of research and 
private study only.  The thesis may not be reproduced elsewhere without 
the permission of the Author. 
 

Statistical Methods for Cricket 
Team Selection 

A THESIS PRESENTED IN PARTIAL FULFILMENT 

OF THE REQUIREMENT OF THE DEGREE OF 

MASTER OF APPLIED STATISTICS 

AT MASSEY UNIVERSITY, ALBANY 

NEW ZEALAND 

Paul J. Bracewell 

1999 


Abstract 

Cricket generates a large amount of data for both batsmen and 
bowlers. Methods for using this data to select a cricket team are 
examined. Utilising the assumption that an individual's natural 
ability is expressed via performance outputs, this thesis seeks to 
describe and understand the underlying statistical processes of 
player performance. Randomness is tested for and then the 
distributional properties of the data are sought. 

This information is then used to monitor the estimate of natural 
ability via widely accepted control methods, such as Shewhart 
control charts, CUSUM, EWMA and multivariate versions of these 
procedures. To accommodate the distribution presented by batting 
scores, a new control chart based on quartiles is also studied. 

Further, ranking and selection procedures employ the estimates of 
individual ability to select the best individuals and note the 
probability of correct selection. 

Major contributions of this study include: 
a) Development of performance measures for cricket 
b) 2 - Dimensional runs test, with further applicability outside 

cricket. 
c) Statistical interpretation specific to cricket 

• Outliers are very important 
• Form is autocorrelation 
• Zone rules for cricket needed to detect good/poor 

performance 
• Relatively short nominal ARL's 

d) Control Chart based on quantiles to preserve outlier influences 
in a non-parametric procedure. 

e) The recommendation of appropriate tools for monitoring 
batting, bowling and all-rounder performance and also choosing 
man of the match. 

f) Discriminates between different types of bowlers using the 
consistency of their performance measures. 

g) Evaluates the members of a team relative to potential 
contenders. 

iii 


in on 


v 

Contents 
Acknowledgements ... .. .. . . ................................ . ......... . ................ iii 

Table of Contents . . . .. . . . . ......... .... .... . .. ... .. .. . ... . .. ......................... . iv 

List of Figures . ..... . .. . .. ...... . ........................ ..... . .. . ... . . . . ..... .. . .. ..... vi 

List of Tables ...... . ..... . ..... . ... . . . .. .. .. . . . .... ... . . ..... .......................... . vii 

Chapter 1 An Overview .. . ..... ...... . .. .. .. ..... . ... ..... . .... . ................. 1 

1.1 Introduction . ... ... .. .. ... ... .. ... ...................... .. ........... ... 1 

1.2 General Overview of Cricket Statistics .. ....... ...... .. . .... . .. ... 2 

1.3 Statistics and Team Selection .... .... ... .............. .. ...... ... .. 5 

1.4 The Application of Quality Control to Cricket ... ... . .... ..... . .... 6 

1.5 Major Contributions of this Study ... ..... .. .. . .. .. .... ... ...... .. ... 8 

Chapter 2 Analysing Individual Data Characteristics ........ ...... .. .. ... 1 O 

2.0 Introduction .. .. . .. . ........ . . ... .. . . . .. . .. . . . ... ... . . .. . .. ....... . ... .. 10 

2.1 Literature Review ..... ... .......... .... . .............. .. ... ... ... . ..... 12 

2.1.0 Performance Output Measures .................. .. ... .. ... 12 

2.1 .1 Investigation of Bowling Measures ...................... .. 16 

2.1.2 Randomness ... .. .. .. .. . .............. .. . . . ... ... . ..... .... ... 16 

2.1.3 Distribution of Performance Measures . . ... . ............. . 18 

2.2 Data .. . ...... .. .. .... . .... ..... . . . .. . . ... .... ... ... ... ... .... . ... .. . . . ... 20 

2.2 .1 Calculation of Batting Measures .. ........... .. .... ..... . .. 20 

2.2.2 Calculation of Bowling Measures ............ . .... .. .... ... 22 

2.3 Methods .. .... . . . ..... ... .... . .. . ... ... . ... ... .. . .. . . . . . . . .. ... .. ...... . 29 

2.3.0 Introduction .......... . ......... .. .. . .... .. ...... .. .... ......... 29 

2.3 .1 Tests for Randomness ........ . .. . .. .... .. .. ........ ... ...... 30 

2.3.2 Distribution Fitting ............................................ 34 

2.4 Results .. .. . .. . ..... .. ........ .. ... . .. .. ..... ..... . .... . . . ..... . .... .. .. . 38 

2.4 .0 Introduction ...... ..... .. . ............................... ..... .. 38 

2.4.1 Batting Results .... ... .... ..... ....................... . .... .. .. 38 

2.4.2 Bowling Results ..... ... .......... .. .................... . ...... 40 


2 

35 

6 

7 

4 

4.1 

4 

4.3 

5 

A 

B 

c 
D 

E 

F 

3 

3 

............................................................. 41 

as 

on 

.................. 1 

..................................... 1 

.. 1 

.................................................. i iO 

..................... 112 

..... ............ ........ ............................ 7 

................ 118 

........... 1 

................................ 1 

........................................................................... 1 

vi 


List of Figures 
Figure 1. Relative Effectiveness of the Attack Index 

Figure 2. Relative Effectiveness of the Economy Index 

Figure 3. Distributional Comparison of Contribution and Score 

Figure 4. Non-parametric EWMA for B.R. Hartland 

Figure 5. Non-parametric EWMA for M.J . Horne 

Figure 6. Distribution-free CUSUM for B.R . Hartland 

Figure 7. Distribution-free CUSUM for M.J. Horne 

Figure 8. Control Chart with Warning Lines . 

Figure 9. Control Chart with Warning Lines for Cricket 

Figure 10. Shewhart Control Chart of M.J. Horne's Transformed Batting 

27 

28 

42 

50 

50 

52 

52 

57 

58 

Contribution With Zone Run Rules . 61 

Figure 11 . Shewhart Control Chart of B.R . Hartland 's Transformed Batting 

Contribution With Zone Run Rules . 62 

Figure 12. CUSUM Control Chart of M.J. Horne's Transformed Batting Contribution 64 

Figure 13. CUSUM Control Chart of B.R. Hartland 's Transformed Batting Contribution . 65 

Figure 14. EWMA Control Chart of M.J . Horne's Transformed Batting 

Contribution With Zone Run Rules . 67 

Figure 15. EWMA Control Chart of B.R. Hartland's Transformed Batting 

Contribution With Zone Run Rules . 67 

Figure 16. Fitted Line Plot of Mean Contribution Vs Mean Score 69 

Figure 17. Quartile Control Chart . 71 

Figure 18. Establishing Number of Consecutive Increasing Points for Alarm 73 

Figure 19. Histograms of simulated Score data with and without Transformation 78 

Figure 20. Histograms of simulated Contribution data with and without Transformation . 79 

Figure 21 . Quartile Control Chart for M. J. Horne . 80 

Figure 22. Quartile Control Chart for B. R. Hartland 81 

Figure 23. T2 Control Chart of C.M. Brown for Bowling Indices 85 

Figure 24. T2 Control Chart of P.J. Wiseman for Bowling Indices 86 

Figure 25. Bivariate Control Chart of C.M . Brown for Bowling Indices 87 

Figure 26. Bivariate Control Chart of P.J. Wiseman for Bowling Indices 87 

Figure 27. T2 Control Chart of A.C. Barnes . 89 

Figure 28. MEWMA Control Chart of Bowling Indices for C.M . Brown 90 

Figure 29. MEWMA Control Chart of Bowling Indices for P.J . Wiseman. 91 

vii 

Figure 30. Plot Showing Ranked Nature of Population Batting order. . 113 


of Tabl 

Table 1: Tests for randomness in 

Table 2: Distribution for Individual 

Table 3: Distribution for Individual 

Table 4: Tests for randomness in 

Table 5: test for Individual 

Performance Measures 

Scores . 

Contribution 

Performance Measures 

Indices 

Table 6. 

Table 7. 

in Consecutive Points 

ARL's 

38 

39 

39 

40 

40 

74 

76 

Table 8. Ratio's Run From In-Control to Out-of-Control. 77 

Table 9. Theoretical Quartile Limits for Horne and Hartland . 79 

Table 10. ot the Number of Points to First Alarm for Horne and Hartland 

for Different Methods 92 

Table 1 i. P-Values from Test for of Variance for Bowlers 97 

Table 12. of -Correct Selection for NZ Statistical XI 05 


Chapter 1. An Overview 1 

Chapter 1. 

An Overview 

1.1 Introduction 

Cricket is a game of numbers. The very core of the sport is entwined with 

numerical values that translate ultimately to a match result. These sport statistics 

are a natural by-product of competitive sport and have been around along as 

contested sport has existed. Currently sport reporters and commentators bombard 

observers with a vast array of numerical values designed to describe an 

individual's performance at a particular ski ll. These added extras contribute to the 

entertainment value of professional sport. However, is this information of use to 

coaches and selectors of cricket teams? 

This thesis is aimed primarily at the selectors of top-level cricket teams. An 

attempt is made to keep the statistics involved as simple as possible so that all 

levels of selectors may apply this methodology to their teams. 

There are several key reasons for measuring and evaluating performance in team 

sport. Organisational Behaviour Theory proves particularly useful in drawing 

together sport statistics and selection. According to Greenberg and Baron (1997) 

to build high performance teams appropriate performance measures are required. 

Tests and measurements are tools that can be used for evaluation of an 

individual's performance (Franks, B. & Deutsch, H., 1973). 

Having found suitable measures of performance these indicators can be used in 

the selection process. For a high performance team, the right team members need 

to be selected (Greenberg et al, 1973). This means combining all available 

evidence, quantitative and qualitative, to make correct selection decisions. 


Chapter 1. An Overview 

While few really important decisions are made purely on the basis of objective 

evidence (Franks et al, 1973), selection decisions cannot be based upon 

subjective evidence alone. The correct balance needs to be implemented. In 

order to use sport statistics successfully, a deeper understanding of the numerical 

values involved is necessary. Firstly the nature of cricket statistics is discussed . . 

1.2 General Overview of Cricket Statistics 

2 

Cricket statistics are meticulously collated ball by ba I. The vocabulary of the game 

continually refers to abstract statistical concepts such as average, aggregate and 

form - without divulging the secrets of what these mystical values contain. For the 

sake of simplicity, all values involved are reduced to one dimension. However, this 

leaves the cricket observer to assume and speculate as to the base values 

involved. The written media has recently taken to describing bowling performance 

by listing the number of wickets taken followed by the bowler's average. This form 

is limited; the basis behind this statement is discussed later in the evolution of the 

bowling indices. 

In recent times, an increasing number of studie.:; have been undertaken to 

understand the statistical processes at work in the game of cricket. G.H. Wood 

and W.P. Elderton started the ball rolling in 1945, analysing individual batsmen in 

an attempt to find a general model that would describe individual scores. This is in 

accordance with the general trend, where most work to date has revolved around 

batsmen. This seems like an apparent contradiction, as the first skill taught to 

junior cricketers is how to bowl, for without bowlers the game cannot be played. 

However, with advent of one-day cricket and now Cricket Max, both geared 

towards entertainment, the game is becoming increasingly batsmen orientated. 

"Batsmen have always received the highest accolades. Most histories of cricket 

are written around them, with the bo'v\~ers regarded merely as a necessary evil." 

(Nigel Smith, 1994, p.177). The reasoning behind the domination of batsmen in 

statistical papers may be due to the perceived ease of evaluation. 


Chapter 1. An Overview 

This leads to the definition of the statistics utilised in analysis of player 

performance, enabling a better understanding of the statistics involved. 

Sport Statistics can be separated into two broad categories; Performance 

Indicators and Performance Outputs. 

A Performance Indicator is a quantitative measure that indicates individual 

performance in a particular facet of the game. These values are collated during 

the game in progress. Effectively, the game is dissected into small manageable 

slices, such that a numerical value can be assigned as a descriptive measure. An 

example of a Performance Indicator, associated with fielding performance, is 

Ground Ball Efficiency, defined as the number of times the ball is fielded cleanly 

divided bt the total number of times fielded . These values do not have a direct 

impact on the match figures. 

In contrast a Performance Output is a numerical expression detailing the direct 

result of participation in an event. For cricket these are summary measures 

detailed in a score book at the completion of an innings, such as score, wickets 

taken, overs bowled and so forth. As a consequence these values have a direct 

impact on the match figures . 

3 

It stands to reason that these two categories are related in some manner. 

However, only performance outputs will be examined in this study, due to the ease 

of data collection and availability. Investigating the possible relationship between 

performance indicators and performance outputs will be analysed in future 

research. 

Statistically, assessing the performance of a batsman is relatively simple, as this 

can be given by a single variable, either runs scored in an innings, aggregate, 

average, or average contribution. 


Chapter 1. An Overview 4 

'Aggregate' refers to the total number of runs scored by the individual over a 

specified period of time. A player's 'Average' is then calculated by dividing the 

aggregate by the number of times the individual was dismissed during the specified 

time. 'Contribution' is the percentage of runs the individual provides the team total 

in an innings. Each value on its own can effectively describe performance. 

Describing performance by bowlers is more complicated A typical bowling 

analysis gives four values; Runs, maidens, overs and wickets. 'Runs' corresponds 

to the number of runs conceded by the individual. The number of maidens bowled, 

refers to the number of completed overs where no runs are penalised against the 

bowler (leg byes and byes are not added to a bowlers total). 'Overs' refers to the 

number of six ball sets a bowler has delivered. Finally. 'wickets' are the number of 

dismissals credited to the bowler. Alone, these individual factors give little insight 

into how well a bowler performed Together, they are more meaningful, but not 

until compared to a full score card can the value of the performance be evaluated. 

The use of the bowling average attempts to describe performance in one 

dimension. This is found by dividing the number of runs conceded by the number 

of wickets taken. However, no time frame is suggested by this value. Essentially it 

is assumed that a bowler will concede 3-4 runs per over. Over a long period of 

time this assumption becomes more valid, but is not suitable for a game by game 

situation. 

The Deliotte Ratings create a one-dimensional measure of performance in Test 

Cricket. This involves an algorithm that takes into consideration several factors 

and weightings. This is currently the method of determining the best players in the 

world. Whilst the formulae involved are extremely thorough. the histories of all 

players need to be known and equally thorough. An attempt to create a one­

dimensional index, using both factor analysis and principal component analysis. 

failed to provide meaningful results (Bracewell (1), 1998) Intuitively this 1s obvious 

as two basic concepts are involved 


Chapter 1. An Overview 5 

Ideally, two dimensions need to be considered, one involving the players attacking 

ability, the other involving the ability to restrict runs. Kimber ( 1993) gives a 

graphical method for comparing bowlers. This utilises two dimensions; the 

attacking ability (strike rate) and the ability to restrict runs (economy rate). 

Bracewell (2)(1998) proposed two independent normally distributed indices, based 

upon strike rate and economy rate, to describe performance. The first index deals 

with a bowler's ability to take wickets , the second with the ability to restrict runs. 

Both indices are evaluated using simple variations of formula that are already 

used, taken relative to the team performance. The section dealing with assessing 

bowlers relies heavily on these indices. Having defined the performance outputs to 

be assessed it is necessary to discuss the relevance in a selection situation . 

1.3 Statistics and Team Selection 

With the wealth and quality of data available in cricket, it makes sense to utilise this 

quantitative information in the selection of individuals to maximise the formation of 

a collective unit (the team). The main assumption underpinning the work in this 

thesis is that a player's natural ability is expressed by individual performance 

outputs collated following the completion of a match . 

Statistics are not the only factors considered when selecting a team. However, 

Former New Zealand Coach Glenn Turner (1998) discusses the importance of 

statistics in choosing players in his book Lifting the Covers. In particular the 

second chapter reveals the emphasis placed on statistics in comparing and 

selecting individuals. In this instance it is used particularly to justify the 

non-selection of players, (Andrew Jones and Ken Rutherford) then to defend the 

selection of Lee Germon. 

"Late in 1995 Francis Payne, cricket author and statistician , provided 
me with statistics which mostly confirmed what we had known before 
we picked our first test team." (p42) . 

Glenn Turner (1998) , Former New Zealand Coach 


Chapter 1. An Overview 6 
--'------~------------------ ·---

Since statistics are used to make and confirm selection decisions it is necessary to 

attempt to understand the nature of the data being generated by participation in 

sport. A greater understanding leads directly to better implementation and 

hopefully a competitive edge, for the selected team. 

Former Australian captain, Richie Benaud, remarked on the simplistic nature of 

selection and the use of statistics stating, "All a captain needs is the confidence 

that his bowlers are each capable of taking five wickets in an innings, his batsmen 

are capable of scoring a century and that everyone can field like Viv Richards 

(Benaud, 1995, p169)." Obviously the captain deals with the players on the field 

and is not responsible for those selected to take the field, this lies in the hands of 

the selectors. The captain must believe that he has been given the best men to 

compete It then begomes the job of the selectors to ensure that the best 

combination of players available takes the field. If statistics are to be used in the 

selection process they must be meaningful, and secondly they must be used in an 

appropriate manner. This means that a relevant application of statistical 

methodology is that of monitoring individual ability. 

1.4 The Application Of Quality Control to Cricket 

The idea of monitoring performance is as useful to the selector and the player as it 

is to the arm chair critic. An ideal method for monitoring an individual's 

performance is with control charts. The control chart is a useful tool in statistical 

process control. First developed by W.A Shewhart, the shewhart charts are widely 

accepted as standard tools for monitoring process of univariate independent and 

nearly normal measurements (Liu & Tang, 1996) Control charts have found 

frequent appiications in both manufacturing and non-manufacturing settings 

(Montgomery, 1997). With slight adjustments shewhart charts can be applied to 

cricket 


Chapter 1. An Overview 

Provided the measurements of the individual's performance are reflective of 

quality, function , or performance then the nature of the 'thing' being measured has 

no bearing on the general applicability of control charts (Montgomery, 1997). 

Montgomery (1997) discloses several reasons for the popularity of control charts. 

At least 3 draw direct parallels to cricket. Possibly most important is that control 

charts provide diagnostic information. This can identify flaws in technique, or the 

tendency for a player to struggle under certain conditions. Also control charts are 

proven at improving productivity, which translates to pushing a player and not 

allowing complacency. 

7 

In Cricket we are interested in selecting individuals that will maximise team 

performa~ce and ensure the best chance of victory. Whilst Cricket is a team sport, 

the nature of the game allows for individual aspects to stand out. Indeed, when we 

look at the possible selection of an individual , it is the performance outputs of the 

individual that is of primary concern. Therefore to ensure the right selections are 

made, it is important the right statistics are used. 

Due to the awkward nature of bowling performance outputs, this leads to the 

evolution of the bowling indices. These two independent, random, standard normal 

indices are a simple and effective way of allowing bowling performance to be 

measured from the post match statistics. They are more useful than the current 

convention used in the written media of quoting the number of wickets taken and 

the bowling average of an individual. 

Utilising the assumption that an individual's worth is expressed via performance 

outputs, this thesis seeks to describe and understand the underlying statistical 

processes that shape our impression of player performance in the second chapter. 

Randomness is tested for and then distributional properties of the data are sought. 


Chapter 1. An Overview 

Armed with information generated in the second chapter, the third chapter 

assesses methods for monitoring the estimate of natural ability. 

Widely accepted control methods, such as Shewhart control charts, CUSUM, 

EWMA and multivariate versions of these procedures are implemented and the 

performance for both batting and bowling is discussed. To accommodate the 

distribution presented by batting scores a new control chart based on quartiles is 

also studied. 

8 

Further, ranking and selection procedures utilise the estimates of individual ability 

to select the best individuals and note the probability of correct selection in chapter 

four. 

Chapter Five then d~tails how this information can be drawn together an applied in 

selecting a side with the assistance of statistics based upon performance outputs. 

1.5 Major Contributions of this Study 

A number of new and novel approaches are presented in this thesis, these 

include: 

a) the further development of individual performance measures for the 

main disciplines of batting and bowling for cricket. 

b) A 2 - Dimensional runs test, utilising the T2 statistic, with further 

applicability outside cricket. 

c) Statistical interpretation of assumptions and results specific to 

cricket namely: 

•Outliers are very important in determining the estimate of 
ability for an individual. 

•Form is autocorrelation. 
•Zone rules for cricket are needed to detect good/poor 

performance. 
•Relatively short nominal ARL's to accommodate the 

restricted number of sampl ing opportunities presented in a 
season. 


Chapter 1. An Overview 

d) A new Control Chart based on quantiles to preserve outlier 

influences in a non-parametric procedure. 

e) The recommendation of appropriate tools for monitoring 

batting , bowling and all-rounder performance and also choosing 

man of the match. 

f) a selection procedure for bowlers that discriminates between 

different types of bowlers using the consistency of their 

performance measures. 

g) Following selection, an evaluation of the probability of correct 

selection of individuals to a team, relative to potential 

contenders. 

9 


Analysing the C 
Cricketer 

Introduction 

measures 

occurrence of 

performance. 

previous performance 

player is too good, 

iS 

2 

Individual 
a ce Data 

batting 

10 

is 

a 

or 

the 

player 


Chapter 2. Analysing the Characteristics of Individual Data 11 

Previous work, involving the analysis of both batsman and bowler performance 

outputs has assumed random performance. This assumption needs to be clarified 

before further progress can be made. The initial thrust of the thesis is the 

identification of what constitutes form. Form can be likened to autocorrelation, in 

that an individual displays patterns or trends in performance over time. 

It is expected that two extremes may exist, either form exists, or performance is 

random. If autocorrelation is present, then form exists. Intuitively performance 

would be considered random, due to the apparent lack of predictability of such a 

sport. "Uncertainty plays a large role in sports, and one can argue that the 

uncertainty associated with sports outcomes is one reason that sports are so 

popular (Stern, 1997, p19)." It has been shown that baseball is a game of chance 

(Cook, 19]7). An analysis of team tactics as related to the game of baseball and 

analysis of the annual World Series competition revealed that results were subject 

more to the laws of chance than the relative calibre of the competing teams. 

Taking a simplistic view of competitive sports suggests this may also be the case in 

cricket (Assuming everyone is equally able to compete, and that natural ability will 

differ, dependent on the pool of talent available). Logistically it would be ideal if 

performance is random. If this is the case then it is a relatively simple task to 

select the best individuals, provided that the sampling distributions to which the 

data belong are known. 

In order to fully understand the summary statistics presented, and make effective 

use of the available information, the statistical distribution for each of the 

performance outputs needs to be known. Fulfilment of this requirement and that of 

randomness satisfies the most important assumptions regarding inference and 

quality control. 

An overview of previous research on performance output measures in cricket is 

presented in the next section . 


2. 

1 re 

Cricket is a 

In 

are 

1 

( 

the Characteristics of 

outcome of every ball 

it comes as a surprise 

statistics of summer 

game perspective. 

Measures 

measures be 

12 

an 

is 

to random 

measures 


Chapter 2. Analysing the Characteristics of Individual Data 

Individual Score is the number of runs credited to an individual during an innings 

and Team Total is simply the total number of runs amassed by that team whilst 

batting in that innings. 

Over a period of time, a batsman's worth is investigated via their batting average. 

The traditional batting average is expressed below. 

L Individual Scores 
Traditional Batting Average= = -----­

dismissals 

However, our interest is with what an individual is expected to score in a given 

innings, as measured by a batting average. 

L Individual Scores 
Bal1ing Average= . . 

mmngs 

13 

Thus the aggregate score is divided by the total number of times batted to provide 

the batting average. This performance measure circumvents the debate 

surrounding the handling of 'not outs' by only considering the average score per 

innings. This is different from the traditional batting average, shown above, which 

estimates the runs scored between dismissals, however this method seems 

redundant due to the time constraints placed upon the game, especially as we are 

to consider the expected number of runs in an innings. A further discussion is 

included in Appendix B. 

Finally, average contribution can be used as a measure of batting performance as 

shown below. 
L Contribution 

Average Contribution = .. 
111mngs 

It is defined in a nature similar to that of batting average. The sum of individual 

contributions is divided by the total number of innings. 


Chapter 2. Analysing the Characteristics of Individual Data 14 

b) Bowling Performance for Individual Bowlers 

Of the individual disciplines, bowling is perhaps the hardest to evaluate 

quantitatively. A typical bowling analysis consists of four variables, Runs 

conceded. Maidens bowled, Overs bowled and Wickets taken. There is no easy 

way of interpreting these values independently. History plays a large part of how 

these statistics are perceived as does the game situation. This section briefly 

reviews the statistical methods for evaluating an individual's bowling performance. 

Kimber (1993) proposed a two-dimensional graphical display for comparing 

bowlers in cricket based on strike rate (SR) and economy rate (ER), taking 

advantage of the relationship that these two values have with the Bowling Average 

(AV). 

SRxER~100AV 

These values are traditionally calculated as follows the Economy Rate (ER) is 

defined as the runs conceded per ball. 

Total Runs ( 'onceded 
Fconomv Rate ~ ----- ·· --- -- --- --------- -

' JiJta/ Halls liowled 

The Strike Rate (SR) is defined as the number of balls bowled per wicket taken. 

ii>tal Halls liow!ed 
Strike Rafe'' - - --------~----­

Wi eke Is htk en 
(Kimber, 1993) 

However, this relationship does not take into consideration the team situation, and 

other confounding variables that confront a bowler, such as the state of the game, 

combined with environmental factors, as these can have an impact on how the 

specific individual's involved, batsman and bowler, approach each delivery. As a 

brief example, a batsman 1s more likely to attack the bowler towards the end of the 

innings, with wickets remaining in a run chase, than a batsman trying to save the 

match by remaining not out 1n a last wicket partnership when a run chase is no 

longer viable. Strike Rate has an additional problem, in that if a player fails to take 

a wicket, a value for SR is not returned as the divisor is zero 


Chapter 2. Analysing the Characteristics of Individual Data 15 

Thus SR is not suitable for evaluation on an innings by innings basis. This measurn 

could be calculated using all the match results for a season, but in terms of 

selection and monitoring a player's performance, it is too late to address an 

individual's worth at the end of the season . Thus only players who have taken 

wickets can have strike rate as a performance measure. 

Bracewell (3)(1998) detailed a novel way to evaluate individual bowling 

performance, incorporating SR and ER into two separate indices that considered 

relative performance to the team. This involved an attempt to form ratio 's that took 

into account an individual 's performance in relation to the team performance. The 

Attack Ratio involved inverting SR for both team and individual so that wickets 

taken was no longer the denominator. 

The ratios are defined as follows: 

Economy Ratio= (Opposition Total I Total Overs - Runs Conceded I Overs] 

Attack Ratio = [Wickets/Overs - Total wickets/Total Overs] 

(Bracewell , (3) 1998) 

However it was found that as the number of overs bowled by an individual 

increased, the score for both indices tended to zero. This was because as a player 

bowls more and more overs (approaches 50%) this player is having a huge 

influence on the team performance. His performance therefore reflects the team 

performance very closely. 

The final evolution of performance measures for bowlers involved multiplying the 

ratio 's by a weighting factor related to time (overs). The problem described earlier 

was removed in this way. 

In addition it was found the Attack index needed to be multiplied by a wicket 

weighting factor, defined in terms of w, the number of wickets taken in any innings. 

This index is therefore innings specific whereas the other measures are more 

general . 


Chapter 2. Analysing the Characteristics of Individual Data 16 

The wicket weighting factor in the Attack Index is given by p(w), the probability of 

taking a certain number of wickets in an innings. Standardisation allows the 

indices to be compared on similar scales. 

The indices are therefore defined as follows 

ECONOMY INDEX= [Economy Ratio x '10vers] 

ATTACK INDEX =[(Attack Ratio ·-!Overs) I (1 - p(w))] 

(Bracewell, (3) 1998) 

2.1.1 Investigation of Bowling Measures 

Of the statistical analyses performed using cricket data, bowling is an area 

deficient in research. Only Kimber (1993) and Bracewell (3)(1998) have examined 

how to measure an individual's bowling performance. Kimber sought to do this via 

a graphical display based on Strike Rate and Economy Rate, whereas Bracewell 

tried extending these values relative to the team. 

2.1.2 Randomness 

Very little research has been done on the aspect of randomness in an individual's 

performance in cricket. A distantly related team sport, baseball, was found to be 

essentially random (Cook 1977). There is anecdotal evidence supporting the 

claim that the role of an individual within a game is random, generally commenting 

on the apparent lack of predictability of cricket. Berkmann, (1990), Brittenden 

(1994) and Turner (1998) are just a small selection of cricket observers that 

subscribe to the unpredictability of cricket view Hunting through player 

biographies also reveals that those who play the game express this view 


Chapter 2. Analysing the Characteristics of Individual Data 17 

Danaher ( 1989) applied a Run's test to 6 English County Cricketers and found that 

none showed a significant runs pattern at the 5% significance level. The batsmen 

chosen were of varying batting ability but chosen because they were either top, or 

close to the top, of their team's batting averages list. 

Kumar ( 1996) suggested that cricket is not by chance. However, this assertion 

was based solely upon over run rates in one-day cricket. The implications of this 

are manifested in the troublesome interrupted match rules. If over rates were 

random, then the simple Average Run Rate (ARR) rule would suffice, as this is 

based on the assumption that run rate of the batting side does not change during 

the innings. Instead, the resources available to a team play an important role in 

determining the outcome of a one-day match. One only needs to look to the 

Duckwort~-Lewis model (1996) to see the effect that time (overs in hand) and 

wickets in hand have in determining a batting side's capacity for team total. Team 

strategies also illustrate this point. As a simple illustration of batting capacity, this 

model accepts the fact that a side is more capable (or daring) of scoring runs when 

only 2 wickets have been lost, as opposed to being 8 down, with 10 overs 

remaining. The reasoning behind this is; with the loss of only 2 wickets, 

presumably the better batsmen are still available, and there are plenty of 

individual's remaining. Thus batsmen are more able to go after their shots, as the 

consequences to the team of their dismissal are not as great. Whereas, a batting 

side with 8 wickets down needs to adopt a more cautious approach , as once a 

team is dismissed, there is no further chance of adding to the team total. 


2.1 

scores 

scores were 

4 

he 

an 

(1 

season an 

Reep, 

scores once 

18 

batting scores had been 

(Pollard, 1977). 

scores began when 

to model the 

a 

this 

more so 

distribution to the 

over one 

some a 

the scores 


Chapter 2. Analysing the Characteristics of Individual Data 19 

Two World-Class players, Geoff Boycott and Ian Botham, were the centre of 

Burrows and Talbot's (1985) study regarding the exponential distribution. They 

found an adequate fit to Boycott's 77 innings and the 50 played by Botham. This 

study also considered the handling of 'Not Out's'. It was found that by adding the 

mean of the exponential distribution to a not out score an estimate is found for 

what the individual was likely to score in that particular innings. As exponential 

random variables have no memory, this is a valid estimate. Furthermore, using 

this information to establish a player's compensated batting average through a set 

of iterative equations and solving the first order difference equation, simply resulted 

in the traditional batting average normally quoted; the total number of runs scored 

divided by the number of times dismissed. However, the nature of the competitive 

game is glossed over. This study "excluded limited overs games since innings in 

these garo_es are necessarily restricted (p46)." To some extent all cricket played 

has some time restriction, whether 50 overs, 3 days or 5 days, hence the need for 

declarations, in the pursuit of victory. Further discussion on the effect of time 

limitations is provided in Appendix B. 

Pollard (1977) conceded "that a more elaborate model needs to be developed to 

describe the distribution of batsman's scores (p129) -" This was due to the fact that 

previous results did not cater for the higher than expected frequencies of failures to 

score, compared to the theoretical models. 

Bracewell (2)(1998) suggested a discrete version of a mixed exponential 

distribution for score and a relatively new concept in cricket statistics, contribution , 

based upon 5609 observations of individuals in the top 6 of the batting order from 

New Zealand domestic first class cricket. This involved separating the occurrence 

of zero and recalculating the mean to find the parameters of the distribution 

involving the non-zero values. 


Analysing the 

0 Data 

used study refers to full score cards of 

cricket obtained from The Shell Cricket 

the Imperial Cricket 

or more day's duration nUl!\J\J<C>c.;>n 

, & Smith I., 1996). 

1997-98 season. 

four Only 

to 

entered 

20 

of 

were 

were 

each scores. 

If 

was 

runs 

an 

a 

or 


Chapter 2. Analysing the Characteristics of Individual Data 21 

fx(X) = r Po if x = O; 

i 
l (1- Po) x 1/~ x e-x1p ifx > 0. (Smith , 1993) 

The probability of a certain score is given by the area of the corresponding interval 

of the probability density function . Considering individual scores and contribution , 

"not scoring" is the failure to score a run. 

Analysing scores from 5609 individuals from only the top six of the batting order 

yielded the following models. 

1) The fitted model for individual scores: 

fx(X) = r 0.015 

i 
l o . 032 e-x128 955 

2) The fitted model for individual contribution : 

f x(X) = r 0.092 

i 
l 0.066 e-x113686 

ifx = O; 

if x > 0. 

ifx = O; 

if x > 0. 

A chi-square goodness-of fit test indicated both models were of significantly good 

fit at the 5% significance level. 

Obviously the ability of batsmen in the top six differs. Thus, the suggested models 

are contaminated. However, the nature of the distributions give an insight into how 

the individual performance outputs for batting are distributed. 


the Characteristics of Data 

of Indices 

use measure of performance for 

998). These measure 

for use on an by were 

duration of 

re-evaluated for 

cases. are 

is w an 

an a 

w = a-b. 

w per 

as a= 11.9 (11.42,12. b 3 (1 1 17) 

the and 

is no 

It was c 

an 


Chapter 2. Analysing the Characteristics of Individual Data 23 

Using an iterative approach it was found that c was 0.25. Using this factor to 

linearize the equation a regression analysis was performed allowing the final 

estimation for a and b. The adjusted r2 value of the regression was 99.5, indicating 

the proposed regression line explained almost all of the variation in the data. 

Estimates for both a and b were acquired , and are given below, with 95% 

confidence intervals, a= 11 .1 (10.61 ,11 .59) and b = 14.2 (13.19,15.21). 

The value for a in the model must always be greater than ten. If it is equal to ten , 

then this assumes that the probability of taking all ten in an innings is impossible. 

Evidence proves this not to be the case: A. E. Moss, in the 1889/90 season took all 

10 wickets in an innings for Canterbury against Wellington at Christchurch on 

debut (Payne & Smith , 1996). 

The interval for 'a' does not contain Bracewell 's (3)(1998) estimate. This suggests 

that the probabilities change slightly given the extra allowable day. However the 

changes provide only minimal difference. When 5 or more wickets are taken in an 

innings the probabilities are approximately equal . The most noticeable difference 

is the probability that no wickets are taken in an innings. The revised estimate is 

less than the initial probability from the three-day game. This suggests that in an 

extended match a player is less likely to go without a wicket. This is possibly due 

to the chance of having a prolonged bowling spell. 

However, the interval for 'b' contains both the original estimate and its confidence 

interval. 

The resultant x2 value of 12.855 indicated a suitable fit , as it is less than the critical 

x2 value of 18.307 at the 5% level of significance with 10 degrees of freedom. 


the probability of a 

given 

Attack I 

As 

number 

Standardising 

lies 

interval 

zero. 

p 

of 

a new 

w=11.1-1 

taking a 

( 11.1 - w 
\ 1 ) 

in 

the 

1 

-0.0912. As zero is 

as 

an as 

w= 1, .. ,10 

is 

I (1 -

overs 

and 

a 

mean it probably 


Chapter 2. Analysing the Characteristics of Individual Data 25 

Utilising the fact that (n - l)s2 

0- 2 

is x2 with n-1 degrees of freedom allows a confidence to be formed interval for the 

standard deviation. As a result a 95% confidence interval for the standard 

deviation shows it probably falls between 0.3855 and 0.3858. Due to 1 not falling 

in this interval , the value given for standard deviation needs to be used to 

standardise the attack index. 

Both these intervals contain the corresponding Bracewell (3)(1998) estimates. 

This suggests that the attack index provides similar results for 3 and 4 day 

matches. Most importantly this index is not sensitive to match duration. 

The final formula for the Attack Index is given below. 

STANDARDISED ATTACK INDEX= [(Attack Ratio x '10vers) I (1 - p(w))] + 0.07883 

0.37461 

c) Economy Index 

As indicated in 2.1 0 Bracewell 's (2)(1998) economy index is: 

ECONOMY INDEX= [ECONOMY RATIO x ,/OVERS] 

Where Economy Ratio has been defined previously and overs is the number of 

overs bowled by the individual in the innings. 

Standardising the above equation by first subtracting the mean (0.3657) and then 

dividing by the standard deviation (3.2016) gives a value to comparable to the 

standardised attack index. A 95% Confidence interval for the mean reveals that 

the population mean probably lies between 0.3851 and 0.3463. As zero is not 

contained within this confidence interval the value for the mean can not be ignored 

and must be included in the standardisation. Similarly a 95% confidence interval 

for the population standard deviation shows that it probably falls between 3.2001 

and 3.2035. Due to 1 not falling in this interval , the value given for standard 

deviation needs to be used to standardise the economy index. 


Analysing the Characteristics 

STANDARDISED ECONOMY INDEX = 

overs 

the 

Rate 2.1.0, a low is 

good attacking abilities taking wickets quickly. 

arises when no wickets are taken, and 

a 

is 

a 

reason an 

delivered 10 overs. 

26 

a 

are 

score is 


Chapter 2. Analysing the Characteristics of Individual Data 27 

Tcta \t\lickas =10. Tcta Overs Bcwted=50. Overs Bcwted= 10 

Figure 1. Relative Effectiveness of the Attack Index 

The first graph , showing the resultant value of the Attack index for various 

quantities of wickets taken in an innings, shows the immediate value of the attack 

index. The number of wickets taken correlates positively with the index score, as 

expected . It can be seen that in the given circumstances, taking 5 wickets in an 

innings corresponds to an index score of approximately 3. When examined in 

context the value of the indices is strengthened. Five wickets were taken in an 

innings only twice in the 97/98 Shell Cup season, which included 34 matches 

(Blake (Central Districts vs Northern Districts) and Maxwell (Canterbury vs 

Auckland)) . Five was also the maximum number of wickets taken by an individual 

in the 97/98 Shell Cup competition . This puts the y-axis parameters in perspective. 

Attack indices of above 3 are relatively rare . 


Chapter 2. Analysing the Characteristics of Individual Data 28 

Below the Economy index is evaluated similarly. 

3- • 
• 

2 - • 
• 

1 - • ii} • u 0 -c • 
>- • 
E -1 - • 0 c • 
8 -2 - • w • 

-3 - • 
• 

-4 - • 
' ' ' ' ' ' ' ' 

10 20 30 40 50 60 70 00 

Runs Conceded 
C:W001ti01 Tota= 220 Tota Overs BoNled=50 Overs Bo.vied= 10 

Figure 2. Relative Effectiveness of the Economy Index 

The y-axis depicts the relative score for the individual's performance given differing 

values of runs conceded. Obviously the fewer runs conceded the better, and thus 

this corresponds to higher scores for the economy index. The above graph is 

much simpler to interpret In this case there exists a negative correlation between 

runs conceded and Index score, as expected. 

Essentially, given an opposition total of 220 (run rate of 44), an individual will 

concede between 10 (1 run per over) and 70 (7 runs per over) almost all of the 

time. That is an individual is most unlikely to concede more than 70 runs 1n a 10 

over spell. 

The preceding graphs depict the value of the indices obtained from the varying 

number of wickets taken, and runs conceded. Having shown that the indices 

perform to expectation, they can be used effectively as a performance measure for 

bowling outputs. 


Chapter 2. Analysing the Characteristics of Individual Data 29 

2.3 Methods 

2.3.0 Introduction 

"The objective of statistical inference is to draw conclusions or make decisions 

about a population based on a sample selected from the population (Montgomery, 

1997, p78) ." Considering cricket, each time an individual participates in a match, 

a sample of their true ability is revealed . Can a series of scores be used to make 

inferences about an individual's ability? If the probability distribution of a 

population from which the sample is gathered is known , then the probability 

distribution of the various statistics computed from the sample data can be 

determined (Montgomery, 1997). More importantly, it can be established what a 

player is expected to score and thus their progress can be monitored, which is 

especially relevant for team selection. 

A population is a set of measurements that can be described by a set of numerical 

measures called parameters (Ott, Mendenhall , 1985). In most applications of 

statistics the parameters are not known but inferences about them are made using 

information contained in a sample. 

For time series analysis it is assumed that for each time point t, Z1 is a random 

variable . Thus the behaviour of Zt will be determined by a probability distribution 

(Cryer) . In this instance time t, refers to each innings and Z1 refers to a 

performance output. 

Previous studies have assumed that the data are independent, in that for each 

individual bowler the previous match result does not have a direct impact on the 

following match result. At first class level this is a safe assumption as it is 

presumed that players who reach this level have developed the necessary mental 

skills. 


gets 

were it job 

"Cricket, fortunately, less 

section attempts to find if assumption 

2.3.1 Tests for Randomness 

Runs 

popular 

a 

a 

zeros 

runs 

:::: 

testing 

ones 

is based on 

n2 = above 

u = of runs. 

it does not 

is 1 

Data 

If 

1990)." 

or 

IS runs test 

an is 

is 

U. are as 

if 


Chapter 2. Analysing the Characteristics of Individual Data 31 

Values for 'ua12, and ua12 are to be found in table XI of Freund ( 1992). If n1 , n2 are 

both greater than 15 then u is approximately normally distributed with : 

Consider the scores of Adam C. Parore in the data set: 6, 8, 87, 4 , 6, 40, 26, 133, 

0, 84, 14, 91 , 0, 63, 111 , 87. 

This series of scores has a median of 33. Thus the re-coded binary data is as 

follows: 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1. 

In the given example n1 = n2 = 8, and u = 12. Table XI of Freund indicates 

'uo02s = 4 and ua 025 = 14. Thus the null hypothesis of randomness is rejected if u ;:::: 

14 or u ~ 4 at the 5% level of significance. As the obtained value of u does not 

violate the limits, there is insufficient evidence to reject the hypothesis of 

randomness 

MINITAB performs this calculation , requesting only the median and the column in 

which the data is stored. 

In this study we need to test for randomness using two measures of performance 

simultaneously, that is, Attack and Economy Indices for bowling in order to 

determine whether performance is consistent (not form dependent) . 

This means we need to extend the runs test to a 2-dimensional test. Consider a 2-

dimensional graph showing the performance of a player for a series of innings 

using two appropriate indices. 


2. Analysing of Data 

2 

are a 

array. 

points 

is performing consistently 

tend to be 

neighbouring 

for randomness 

:::: 1 

the 

the any patterns 

close together. Hence the distance 

As a consequence this, by the 

as 

a 

nature 

k 

performances it is 

standard normal populations are 

a 

our analysis the 

a of 

patterns in 

generates a detailed 

zero 1 

is 

are 

runs 

mean or 

as 

correlation 

series, but 

of 

is 


Chapter 2. Analysing the Characteristics of Individual Data 33 

The default lag (n/4) was used fork, where n is the number of observations in the 

series. The Ljung-Box Q statistics acts as safeguard against the explosion of the 

probability of Type I errors by testing the null hypothesis that the autocorrelations 

for all lags up to k equal zero (MINITAB, 1996). 

If the Majority of time series do exhibit autocorrelation, then the job of the selector 

is much harder. No longer is the average estimate of an individual 's ability 

sufficient. From the historical data, performance predictions need to be made 

using prior performance. In terms of recording results from this analysis, when the 

95% confidence limits have been crossed, the player in question is noted as 

displaying significant autocorrelation, thus failing the assumption of randomness. 


Chapter 2. Analysing the Characteristics of Individual Data 

2.3.2 Distribution Fitting 

In the next section we determine which of the following distributions best fit the 

performance output data for an individual player. 

34 

Three distributions are investigated for the batting data, Exponential, Negative 

Binomial and Geometric. As discussed in 2.1 .0, these distributions have been 

used to model individual player performance by other authors. Below the 

properties of each distribution is listed. In these formulae p denotes the probability 

of a success and (1-p) denotes the probability of a failure for independent Bernoulli 

trials. 

Exponential 

Pdf 

Mean and variance 

Negative Binomial 

Pmf 

Mean and variance 

For r = 1 

Geometric 

Pmf 

Mean and variance 

f(xlP) = (1 /f3) .e-x'P, 0 ~ x ~ oo, f3 > 0 

E(X) = p, VAR(X) = p2 

P(X=xlr,p) = pr(1-pt; x = 0,1,2,. . . ;O ~ p ~ 1 

E(X) = (' +:- 1
) r(1-p)/p, VAR(X) = r(1-p)/p2 

E(X) = (1-p)/p, VAR(X) = (1-p)/p2 

P(X=xlp) = p(1-p)x-1
; X = 1,2, ... ; 0 ~ p ~ 1 

E(X) = 1 /p , VAR(X) = (1-p)/p2 

(Casella, Berger, 1990) 

These distribution are waiting time distributions. These are suitable as our interest 

is with the number of runs scored till completion of the innings. The geometric 

distribution is the simplest of the waiting time distributions, and is also a special 

case of the negative binomial distribution when r is set at 1 (Casella, Berger, 1990). 

Note that for the formulae above X denotes the number of failures before the rth 

success for the negative binomial , while X denotes the trial corresponding to the 

first success. 


Chapter 2. Analysing the Characteristics of Individual Data 

The negative binomial , and thus the geometric, are discrete versions of the 

exponential function . The exponential distribution is a continuous distribution 

whereas the data here is discrete. 

35 

In itially it may seem redundant including both the geometric and negative binomial 

with r set at 1. However, there is a key difference in the estimator used for the 

sample mean, as seen in the above table. Previous research suggests that these 

are the most likely distributions to model individual batting scores. Hence the 

inclusion of the negative binomial with r =1 . 

In this study parameter estimation is done using the method of moments. 

The method of moments is one of the oldest methods for parameter estimation . 

This meth_pd consists of equating the first few moments of a population to the 

comparable moments of a sample, obtaining the required number of equations 

needed to solve for the unknown parameters of the population (Freund, 1992). 

Given a population has r parameters, the method of moments consists of solving 

the system of equations 

k = 1,2, ... ,r for the r parameters 

I /1 

, "\""' k 
m1 =-~ x, 

11 10 1 

All three distributions being dealt with here require only one parameter to be 

est imated (Negative Binomial set r=1 ). 

Thus m1 ' = µ1' is used. 

Exponential 

The mean is given by E(X) = ~ = µ1', and the expected value from the sample is the 

mean 

Therefore setting the method of moments estimator for the exponential parameter 

is simply: fJ = x 


Chapter 2. Analysing the Characteristics of Individual Data 

Negative Binomial 

E(X) = (1-p)/p = µ1' and X = m1' Therefore setting: 

- I··· p 
.'( .CC --··-

p 

36 

Rearranging give the method of moments estimator for p for the Negative Binomial 

Distribution . I 
p ~ =·--

x I- l 

Geometric. 

E(X) = 1/p = ~t1', 

Setting m,' =µ,'provides: 

Subsequent rearrangement yields a method of moment estimator as follows 

Using the estimates for the parameters of the given probability distributions, the 

data can be modelled and the fit evaluated using the chi-square Goodness-of-Fit 

test 

Fitting a Mixed Distribution 

As previously discussed in 2.1.2 a mixed model may be a better fit, due to the 

higher than expected number of zeros (Smith, 1993) (Bracewell. (2) 1998) As a 

result a separate component needs to be built into the probability model to cater for 

the number of zeros. The second component of the 'ducks and runs' distribution 

deals with the non-zero portion. Let Po be the probability of a zero score. In order 

to fit a mixed distribution it is necessary to multiply the probability model by (1-po) 

so that the area under the probability model 1s equal to one, that is (1-po)Px(x) for 

x>O. For a geometric distribution. the sum of the probabilities for all possible 

scores, shown below, clearly converges to one as n approaches infinity: 

" 
p,, '(I f'o)Lp(I - jJ)' ' 

' I 


Chapter 2. Analysing the Characteristics of Individual Data 37 

For calculation of the parameters of a mixed distribution (Po and p or p), the fraction 

of data set at 0 is separated out and the mean recalculated . The new sample 

mean is then used as the parameter estimate for the probability distribution (p or 

p) . 

The probability mass function of an individual batsman's contribution can be 

presented in the following form : 

Pmf P(X=xiR) = Pc(1 -Pcf'; 0 ::; x ::; 100; 0 ::; Pc ::; 1 

Where Pc represents the reciprocal of the mean contribution and x corresponds to 

the random variable for percentage of the team total . It then follows that the 

probability mass function of individual batsmen scores can be represented as 

follows : 
r 

I Pc if x = O; 0 :-::::; Pc :-::::; 1 

Pmf P(X=xiPcil~) ·=- ~ 
l x = 1,2 ,. 0 :-::::; Ps :-::::; 1 

Once again , Pc represents the reciprocal of the mean contribution and x 

corresponds to the random variable for individual total. 

Siniilarl:v, Ps is the mean scor(~ inve1ied. 

Normality Test 

Bracewell (3) (1998) hypothesized the bowling indices for individuals are normally 

distributed. To test this hypothesis a normality test needs to be performed. The 

normality test for the bowling indices involved the generation of a normal 

probability plot. The probability for the x-values (index) is calculated then plotted 

against a standard normal probability score. A least-squares line is fitted to the 

points . This forms an estimate for the cumulative distribution function from which 

the data for the population is drawn. The Anderson-Darling test for normality is 

used, which is an ECDF (empirical cumulative distribution function) based test. 


Chapter 2. Analysing the Characteristics of Individual Data 38 

2.4 Results 

2.4.0 Introduction 

To enable a chi-square goodness-of-fit test to be performed on the batting outputs, 

the data needed to be broken into manageable segments, displayed as follows: 

Score 0 

Contribution 0 

1-10 11-20 21-30 31-40 41-50 51-100 100+ 

1-5 6-10 11-15 16-20 21-25 26-30 31+ 

All tests were performed at a 5% significance level. 

2.4.1 Batting Results 

Considering only individuals who batted in 20 or more innings yielded 66 

individuals for analysis. A brief summary of the results follows. Appendix D 

contains the full results. 

Pass Fail 

Autocorrelation Score 62/66 4/66 

Contribution 62/66 4/66 

Runs Test Score 64/66 2166 

Contribution 64/66 2/66 

Table 1: Tests for randomness in Batting Performance Measures 


Chapter 2. Analysing the Characteristics of Individual Data 39 

Score Pass Fail Obtained x2 Critical x2 

Exponential 49/66 17/66 739.03 581.51 

Geometric 52/66 14/66 705.04 581 .51 

I Negative Binomial I 53/66 13166 I 711 .85 581 .51 
I i i 
I I I I I I 

I I I I 
Mixed Exponential 65/66 1/66 336 47 512 06 

Mixed Geometric 65/66 1/66 335.04 512.06 
I 

Mixed Negative Binomial 65/66 1/66 397 .14 512.06 
I 

Table 2: Distribution Fitting for Individual Batting Scores 

The obtained x2 and critical vaiues shown in tables 2 and 3 refer to the fit of the 

model over the entire population. That is the x2 values for all individuals are 

summed and compared to the critical / value. For the standard distributions this 

was 527 degrees of freedom (66x8-1) and 461 degrees of freedom for the mixed 

distributions (66x 7-1) . This helped confirm the best model . 

Contribution Pass Fail Obtained x2 Critical x2 

1 

Exponential 62/66 4/66 482.38 581 .51 

i Geometric 56/66 
1

10166 I 645.29 581 .51 
I 
! Negative Binomial 62/66 I 4/66 I 449.44 581 .51 

I 
I ! Mixed Exponential 57/66 

I 
9/66 470.55 512.06 

I I 

Table 3: Distribution Fitting for Individual Batting Contribution 


Chapter 2. Analysing the Characteristics of Individual Data 

2.4.2 Bowling 

Similarly only individuals who bowled in 20 or more innings were considered, 

providing 35 individuals for examination. A brief overview of the results follows. 

Appendix C gives the results in full. 

i - - ····-· I -- -1 PaSSf--Falll 
1-Autocorrelation !E:corlomy I 34135-j 1135 1

1

· 

1 I j I [ ___________ J Attack _T! 32!35-~=~--i 
I Runs Test : Economy 32/35 i 3/35 i 
, i Atiack ! 34/35 1 /35 I 

Bivariate 32/35 3135 · 
! 

___________ J. -------'--------------·~-

Table 4 Tests for randomness m Battmg Performance Measures 

~~--c:;r:-m~il: --~~- I, -Eco;omy-1~~:1~::--~ -;~~---I, 
' ! Attack I 33/35 i 2/35 I 
l- _ --------------- ______ L ________________ _____L ___ "------_ _i_ ______ _J 

Table 5: Normality test for Individual Bowling Indices 

40 


Chapter 2. Analysing the Characteristics of Individual Data 41 

2.5 Discussion 

The evidence provided from the analyses performed in this chapter clearly 

suggests that individual performance in the primary disciplines of first class cricket 

in New Zealand is random. 

Considering first the case of batting. Only 2 individuals from the sample of 66 

failed the runs test. These two individuals, Mark Haslam and Shayne O'Connor, 

failed the test for both contribution and score. As both are primarily selected as 

bowlers (Haslam SLA and O'Connor LFM) and have low medians it could be 

argued that the basis behind the non-random behaviour is that their skill level with 

the bat is not sufficient. Four individuals failed the test for Autocorrelation. Due to 

the low numbers violating the assumption of randomness, expressed through the 

runs test and the test for autocorrelation, it is considered sufficient evidence to 

claim that batting is random in New Zealand first class cricket. 

A similar situation applies to the bowling results. From the sample of 35, at most 3 

failed the runs-test, or the test for autocorrelation. Once more the majority exhibit 

random behaviour and this is taken as sufficient evidence for stating that bowling 

performance outputs are random in New Zealand first class cricket. 

According to the analyses performed individual batting scores are best modelled by 

a mixed geometric distribution, mixed in two parts, the zero portion and non-zero 

portion. Batting contribution is best modelled by the negative binomial distribution 

(with r set equal to 1 ). The zero component of this distribution represents the zero 

portion of the score distribution. 

It is important to note that the negative binomial distribution,and the parameter 

from the contribution distribution, model the occurrence of zero amongst individual 

scores. This is an interesting phenomenon. 


Chapter 2. Analysing the Characteristics of Individual Data 42 

Obviously either score or contribution has to be mixed as both share the same 

number of zeros; unless the team continually scores exactly 100 in each innings, in 

which case the two distributions will be equivalent. That is the score is effectively 

the percentage contribution, as score is continually divided by 100 runs (the team 

total). Considering the case of scoring 0, the probability of this occurring is the 

same for both contribution and score. This is because for contribution O/y = 0, 

where y is the team score. Due to the sample mean representing the shape 

parameter, there is a difference between the shapes for the score and contribution 

distributions. 

This is shown in the graph on the following page detailing the probability mass 

function for differing values of score and contribution. The means for contribution 

and score are 14% and 29 respectively. These were the population values of top 6 

batsmen obtained from Bracewell (2) (1998). 

0.07 
0 

0.06 0 
CJ 
a 

0.05 8 
0 

£ CJ 
0.04 0 Key 

:.0 0 
cu 

0.03 0 C.ootrh.rtioo ..0 e x Score a.. 
0.02 

O.Q1 

0.00 

0 50 100 

Scoring Performance 

Figure 3. Distributional Comparison of Contribution and Score 


Chapter 2. Analysing the Characteristics of Individual Data 43 

Tt1e distribution for scores confirms the high likelihood of being dismissed early. 

Tl1e fact that the distributions involved are memory-less, as discussed in Appendix 

B, is also of inter esl. 

This harks back to the old adage; it only takes one ball, referring to the fact that 

only one bail is needed to dismiss a batsman, no mater what score the individual is 

on. 

The results from the normality test showed that Bracewell's (3) (1998) initial 

hypothesis of normality for the Bowling Indices is correct as an overwhelming 

majority exhibited this property (33/35 for Attack and 31 /35 for Economy). 

The above results confirm the beliefs discussed in the literature review. Having 

proved th~t performance outputs are random, and gained knowledge of the 

distributions adhered to by the data, this enables sound statistical methodology to 

be applied to the performance outputs. A natural extension of this knowledge 1s to 

monitor individual ability through statistical process control. This is approached in 

t11e next cl1apter. 


CHAPTER 3. Monitoring Player Performance 

Chapter 3 

Monitoring Player Performance 

3.1 Introduction 

implementation of quality control procedures is in it 

of the most statistically successfully players, IS 

Furthermore, it enables the mon 

any change the a 

of certain coaching 

A arises with the study of sports data to 

44 

use 

standard it is preferable to be better than the average. It is therefore 

an 'out-of-control' on the of 

portant to are 

how an 

data it is important to note 

deviation), as these are 

an 


CHAPTER 3. Monitoring Player Performance 45 

Performance measures are initially standardised with respect to the population , 

that is subtracting the population mean and then dividing by the population 

standard deviation, giving estimates of the individuals ability relative to those 

competing in the same competition. In order for charts to be based upon the 

standard normal distribution, these indices are standardised using personal means 

and standard deviations. When the data is standardised and tested with the quality 

control tools, the test is for how reliable our estimate of the individual's ability is. 

Chapter Two sought to prove the fundamental assumptions involved with statistical 

process control, namely independence and normality, in the context of 

performance evaluation in cricket. Following on from the findings of chapter two, it 

is relatively easy to apply control charts to the bowling indices and contribution as 

the assul'!]ptions of normality, and independence are upheld (normality for 

contribution is achieved by transformation). Thus. the application of conventional 

parametric quality control methods is assessed in this chapter. 

Initially, possible techniques for the situation presented by individual scores are 

reviewed and a selection applied to real data. Standard procedures for use with 

normal data are then discussed. For the univariate case, three methods will be 

examined , namely the shewhart control chart for individual observations with run 

rules. CUSUM and EWMA. Then a new type of non-parametric control chart 

based on quartiles is proposed to deal with the mixed distribution of 'ducks and 

runs' presented by individual batting scores for an innings. The theoretical 

quartiles of this distribution are used, maintaining the integrity of the distribution. In 

the entire scheme of things we are interested in a control method that picks up a 

change in performance within a season. Thus the control chart must pick up 

changes rapidly. In designing charts and rules consideration must be given to the 

number of 'samples' per season. The basis behind the need for a short nominal 

ARL is determined by the structure of the Shell Trophy competition and also the 

relative lack of sampling opportunities supplied by cricket in general. 


a 

is 

a 

are 

in area 

are 

scores are 

on 

was 

It is 

a case can IS 

an 

team scores are 

no sense a 


CHAPTER 3. Monitoring Player Performance 47 

As cricket is played under varying conditions against varying opposition it is not 

suitable to artificially create subgroups. Information relating to an individual's 

tendency to struggle under differing conditions is potentially lost. Also , the special 

nature of an outlying score can be forfeited . It is well known that an outlier can 

influence our estimate of an individual's ability. In this case , the effect will be to 

inflate the mean. However, we want to retain that influence as it indicates the 

individual is more capable than suggested by the bulk of the data. By the reduction 

to ranks the nature of very large scores is removed. Our estimate of an individual 's 

performance indicates what a person is capable of scoring. The presence of an 

outlier can reveal that our estimate is possibly wrong and the player in question is 

capable of much more . Outliers are usually regarded as aberrations or errors. But 

in cricket outliers must be regarded in a totally different light. High scores are 

valuable 2bservations for performance measures. Another problem arising with 

the use of ranks dwells with the presence of ties. This makes the use of ranks 

inexact (Rossini, 1997). The probability of this occurring is quite high due to the 

likelihood of an individual not scoring. 

Hackl and Ledolter ( 1992) proposed a non-parametric technique utilising 

sequential ranking. This method involved using the sequential ranks of 

observations in association with an EWMA control chart. The method is outlier 

resistant , as all rank based charts are. Hence, the importance of the extreme 

score described previously is ignored. 

After their early attempt with the Wilcoxon signed rank statistic Reynolds and Bakir 

joined Amin (1995) in investigating a method based on the sign statistic gathered 

from within artificially created group. As discussed previously it is inappropriate to 

deal with subgroups in first class cricket as they are not always present. 

McGilchrist and Woodyer (1975) applied a Distribution-free CUSUM procedure . 

However, this method also reduced scores to being above or below median thus 

ignored the impact of high scores . 


R3. 

(1 

A 

score 

scores we are 

It 

an 

an 

are 

reason a new non-

non-

if we to 

a IS 


CHAPTER 3. Monitoring Player Performance 49 

In addition previous work has considered subgroups of data, where natural 

subgroups do not occur the authors recommend the creation of artificial groupings 

of innings results for an individual player. In this thesis it is argued that this 

approach is also not appropriate because: 

• Considering the match context, subgroups of size 2 would be reasonable 

except that a player may bat in only one innings. 

• Cricket is played in variable conditions , playing surfaces or environmental 

conditions may vary and as a result artificial subgroups are meaningless. 

• Outliers are lost 

In this section three of the methods outlined in the literature review will be applied 

to the batting scores of Hartland and Horne and compared to the Quartile chart 

later. 

a) Non-Parametric EWMA (Hackl, Ledolter, 1992) 

This control chart is based on an EWMA of sequential ranks. Where the sequential 

rank, R·1, is an observation 's rank amongst the most recent g observations. This 

chart performs well with slowly trending process levels. Once again a short ARL is 

required and this is obtained approximately from Figure 1 and Table 1 (Hackl et al , 

1992). An immediate problem arises with the selection of g. As we require a short 

nominal ARL, we also require a relatively small g, which can lead to correlations 

among successive ranks. The control statistic is defined as follows : 

l = 1,2, .... 

The initial value ,T 0, is set at zero. Three parameters are required, obtained from 

tabulated values where the group size is taken at the smallest available value. 

Thus the parameters are set as follows to give a nominal ARL "" 19.8, "-=0.25 , g=4, 

h=0.2980. 

The resultant statistics, Tt. can be examined via normal chart form , where an alarm 

for an out-of-control situation is signalled if I T11 > h. As this is the most recent non­

parametric technique it is also applied to the simulation data in section 3.4. 


CHAPTER 3. Monitoring Player Performance 50 

0.3 

0.2 

0.1 

r= 0.0 

\ 

-0.1 

\~ -0.2 

-03 v 
L 

10 30 

Figure 4. Non-parametric EWMA for B.R. Hartland 

Two Aiarms are signalled for Hartland, one very early on and the other towards the 

end of the series. Both are associated with signals for inferior form. 

-'J'.l -t==o::--=--=--=--=-=r=======:r=====::J 
1') 

lnnirqs 

Figure 5. Non-parametric EWMA for M.J. Horne 

No alarms are signalled for Horne, suggesting that he is consistently playing to the 

level his natural ability 1mpl1es. 


CHAPTER 3. Monitoring Player Performance 51 

b) Non-Parametric CUSUM (McGilchrist , Woodyer, 1975) 

To allow detection of changes in the extreme distribution posed in hydrology 

McGilchrist et al developed a distribution free CUSUM . Using an even number of 

observations the control statistic V1 is defined as follows : 

I 

V, = I q(XJ - k ) 
1 =1 

Where q (x) = 1, x ~ 0, 

-1 , X<O, 

In order to make Vn = 0, k is the sample median . 

The series V 0, V 1, .. ., V n is then a one-dimensional random walk. When the 

observatigns in question are independent then all paths from (0,0) to (n ,O) are 

equally likely. Two methods are available to evaluate if the process is in control 

The first sets control limits, the second considers the number of returns to zero and 

is effectively a test for randomness and will not be pursued further. 

The control limits for this control chart are evaluated from table 16 in Conover 

(1971 ). When these control limits are crossed , this is evidence of an out-of-control 

process. To have a comparable nominal run length for the non-parametric EWMA 

the level of significance is set at 6% (ARL=16.7). For a 6% significance level the 

critical level for Vi is approximately 1 .882 x \ m where m is half the sample size of 

the individual in question. Thus for Horne, m=12 , the critical value becomes 6.52 

at the 6% level of significance . For Hartland , m=16, the corresponding value is 

7.528. 


R3. 

3 

2 

~ 0 
~ 
::> -i 

-2 

-3 

10 30 

10 20 

imrgs 

for 

s 


CHAPTER 3. Monitoring Player Performance 53 

c) A non-parametric CSCC procedure based on Within Group Rankings 

(Bakir, Reynolds, 1979) 

This non-parametric procedure was developed to quickly detect any shift in the 

mean process level. Using Wilcoxon signed rank statistics and within group 

ranking , a CUSUM type procedure is implemented. CSCC, as defined by Bakir et 

al , given in the sub-title is better known as the CUSUM (Cumulative Sum Control 

Chart). For the within group ranking subgroups are required. Where these do not 

occur naturally, they must be created artificially. In this instance artificial 

subgroups of size 4 are created. To detect shifts in the process level on the 

positive side the following control value is applied. 

m m 

L )SR
1 
- k ) - min L, (SR

1 
- k l~ h 

o~ ,,, ~" 
1= \ 1=\ 

Similarly, to detect deviations in the process level on the negative side the formula 

below is implemented. 

Ill fl/ 

max L (SR
1 

- k ) - L, (SR
1 
-k ) ~ h 

0-Sm~n 
1=\ 1=\ 

The Wilcoxon signed rank statistic is defined as follows: 

SR = L, rank (I X . -µn I) 
\ ·"'' >0 (Smith , P. 1993) 

Signed rank has the advantage of conside ring the relative rankings of the 

magnitudes of the data points. A problem associated with the signed rank test in 

this instance is the assumption of a symmetrical distribution . Clearly the mixed 

distribution presented by individual batting scores is not symmetrical. As a result 

the median is used to circumvent this problem. In this case the mean µ 0 has been 

replaced by the median. Effectively the median is subtracted, then the values 

ranked . Finally , those ranks associated with points greater than or equal to the 

median are added to provide SR. 


CHAPTER 3. Monitoring Player Performance 54 

From the tabulated values of Bakir et al the parameters are chosen to provide a 

short nominal ARL in the vicinity of 17. Choosing k=O, h=6, and g=4 provides a 

nominal ARL=17.07. Applying the above procedure to the score series of Horne 

and Hartland an alarm is signalled in both cases after 8 innings (2 groups) 

indicating that the estimate of batting ability for both individuals has changed. For 

each individual this alarm was associated with a decrease in performance. 

Bracewell (3)(1998) showed how shewhart control charts could be used to monitor 

bowling performance. It was also shown that the interpretation of zone run rules 

could be modified to accommodate the example presented by cricket. A brief 

demonstration of how multivariate control charts using Hotelling's T2 statistic was 

also provided in the same study. These approaches are described and extended 

in the rest of this ch?,pter 

3.3 Univariate Quality Control Methodology 

It is appropriate to monitor an individual's performance with control charts. 

Provided the measurements of the ·product' are reflective of quality, function, or 

performance then the nature of the ·product' has no bearing on the general 

applicability of control charts (Montgomery, 1997) 

The control chart is a useful tool in statistical process control. First developed by 

W.A. Sl1ewhart. the Shewhart charts are widely accepted as standard tools for 

monitoring process of univariate independent and nearly normal measurements 

(Liu & Tang, 1996). 

Control charts have three fundamentai uses 

1 Reduction of process var1ab1lity 

2 Monitoring and surveillance of a process 

3 Estimation of product or process parameters 

(Montgomery, 1997). 


CHAPTER 3. Monitoring Player Performance 55 

It is the second use that is of the essence in the application to cricket, and possibly 

other sports. Process in industry is the parallel term to player performance. 

Control charts have found frequent applications in both manufacturing and non­

manufacturing settings (Montgomery, 1997). 

The third use is also of relevance when dealing with team selection. This is a 

result of the interest in the estimate of an individual 's ability in relation to other 

player's available for selection . 

Before standardising the data it is important to note the parameter values as these 

are the estimates of the player's latent ability. In particular those applied to the 

bowling indices. These are initially standardised with respect to the population , by 

the natur~ of the indices, giving estimates of the individuals ability relative to those 

competing in the same competition. For charts that assume a standard normal 

distribution , these indices are standardised again , for within person evaluation, 

using individual means and standard deviations. When the data is standardised 

and tested with the quality control tools , the test is for how reliable the mean is as 

our estimate of the individual 's ability. 

3.3.1 Shewhart Control Charts 

Shewhart charts are strongly dependent on the assumption of normality and 

independence. Also assumed is the absence of between subgroup variation when 

the process is in control (61 .325 Study Guide) . However, this statement is 

irrelevant in this study, as only individual innings observations are taken , that is 

only subgroups of size one exist. 

The operation of a shewhart control chart with only action limits is slow detecting 

small shifts in process level (61.325 Study Guide) . However, the Shewhart chart 

can be sensitised by utilising zone rules. 


1 

to 

case 

use 

are 

is a 

a 

mean, 

Performance 

& 

of 

as 

IS 

as 

s 

IS 

in 

is 

on 

is 

it 

a 

as: 

3 

it is 


CHAPTER 3. Monitoring Player Pertormance 57 

However minor alterations need to be made to the labelling of the zones and to the 

rules to make it compatible with the evaluation of an individual's performance. 

Whilst being similar, there are fundamental differences in testing for quality in a 

product and sport. Typically quality control monitors the maintenance of certain 

control limits and deviation from a common mean. 

3s 
A 

2s 
Q B 
U M s 
A E 
L A 

c 

Where 

I S c 
T U -s --------- ----------
y R B 

E -2.,._e _________________ ~ 

A 
-3s 

Subgroup number 

Figure 8. Control Chart with Warning Lines 

UCL= Upper Control Limit 
LCL= Lower Control Limit 
LWL= Lower Warning Limit 
UWL= Upper Warning Lim it 

UCLx 

UWL 

x 

LWL 

LCLx 

In a sporting context. which side of the mean a point falls is important and needs to 

be built into any control chart. 


R3. 58 

A. 
2s E.P. 

B B 
I s 

c 
L 
I D 

-s 
E 

a zones can 

cr are are 

cr is mean. If a 

a 17 

zone are a 


CHAPTER 3. Monitoring Player Performance 59 

Zone Rules for Cricket 

Bracewell (3)(1998) proposed the following interpretations on the Zone Rules in a 

cricketing context. Understanding the intrinsic differences between the product 

attributes and sports performance allows the zone rules for Shewhart Control 

Charts to be manipulated to indicate 'Out-of-Control ' conditions. This effectively 

means that the individual 's performance relative to the team is no longer random in 

most instances. The tests given by Montgomery (1997) are modified to suit the 

situations presented in cricket. 

Test 1. Extreme Points 

Points that fall outside the control limits. Falling outside of zone A indicates an 

exceptionally brilliant performance relative to the team. Conversely a point falling 

outside ZQne F indicates an exceptionally awful performance relative to the team. 

Test 2. Two Out of Three Points in Zones A or F and Beyond 

Two of three performances in and beyond A shows continued excellent 

performance. Whereas, two of three performances in and beyond f shows 

continued inferior performance. 

Test 3. Four Out of Five Points in Zones B or E and Beyond 

Four out of five successive points in zone B or beyond indicates continued good 

performance. The same situation for zone E and beyond reveals continued bad 

performance. 

Test 4. Runs above or Below the Centreline 

This test considers long runs (eight or more successive innings) either strictly 

above or below the centreline. This indicates either consistently above expected 

performance (above the centreline) or consistently below expected performances 

(below the centreline) . 


versa. 

R 3. 

14 

a 

is 

a 

7. A 

is 

15 

If no 

in 

in 

an IS 

Dan 

can 

in zones C 

or 

is 

as 

in 

or 

D. 


CHAPTER 3. Monitoring Player Performance 61 

The numbers identifying the tests differ from those used in MIN ITAB. As 

mentioned earlier, the shewhart chart can be set up in two ways. The first uses 

only the contro l limits th at are set at 2 standard deviations away from the process 

mean. Secondly all the run rules are applied along with the traditional 3-sigma 

limits. Both cases can be shown simultaneously. 

5 
3.00l...=4 639 

c 4 2.CB..;:.3.m 
0 ·s 3 

!\!\ M11 1' /\ 

.0 
·c 
c 'I = 
8 2 

\J\j y v 
v~c \ X=2.C64 

-0 
Q) 

~ 
0 -2.C$.;:0.:m3 't7l I c 0 
~ 

-3.0SL.:-0.5314 I-
-1 

0 5 10 15 20 2) 

Obsarvation NJmber 

Figure 10. Shewhart Control Chart of M.J. Horne's Transformed Batting 
Contribution With Zone Run Rules. 

The initial graph (zone rules, 3-sigma limits) reveals no signals, indicating that 

Horne is performing to expectations in terms of contributing to the team total. This 

also indicates that the impressive estimates for his natural ability are significantly 

adhered to. 

If the 2-sigma limits are applied , an alarm is signalled at point 19. However, this 

corresponds to a score of zero, which is not necessarily an indication of form 

change. 

As it is impossible to achieve a negative score with cricket data as applied to this 

type of control chart , negative control limits should be set at zero. Nevertheless, it 

is useful to see the impact the effect control limits have in this type of situation. 

Effectively every t ime an individual fai ls to score an alarm is signalled. 


R 

4 3.0Sl.=3.885 

3 2.0Sl.=3.144 

2 
X=1.661 

-20Sl.=0.1793 

-3.0SL=-0.6617 

0 10 

11. 

a 

is 

to a zero score. 

IS as as 

zone 

or new 

as as or 

are next. 


CHAPTER 3. Monitoring Player Performance 

3.3.2 CUSUM 

The British Statistician Page first proposed cumulative sum (CUSUM) charts in 

1954. This procedure involves cumulating sums, such that past values have an 

impact on the control statistic . 

63 

All measures of performance involved in this study are standardised to make 

computations somewhat easier for setting up general standards and relating these 

in terms of player performances. Also the implementation of the charts is designed 

to monitor the given estimate of ability. It is preferable to work with one-sided 

standardised CUSUMs for the case presented by cricket. As mentioned earlier, a 

special situation arises in the application of quality control methodology to sport. If 

a player' ~performance process changes such that an alarm is signalled, it is 

necessary to note if the alarm was due to superior or inferior performance . 

Obviously if an individual 's performance is improving the desired situation has 

occurred . Standardised values are used to compare within player performance on 

relative scales. 

As we are dealing with individual observations the statistic required for the CUSUM 

scheme is given as: X -X 
I 

.... , -
CJ \ 

For detecting shifts on the upper side of the mean the procedure is defined as 

With SHo set at zero. The slack constant , k, must be less than 3 or a situation akin 

to a shewhart chart is implemented , which detects only large shifts. Generally this 

value is 0.5, designed to detect smaller shifts. 


CHAPTER 3. Monitoring Player Performance 64 

The next part of the chart is the adoption of some threshold value, for which when 

crossed, indicates an out of control situation. This value is referred to ash. 

A similar procedure is used for detecting shifts on the tower side of the mean: 

Su= max {0,(-z;-k)+ Su+1}, with SHo set at zero. 

The optimal values for the parameters of the CU SUM procedure can be found from 

Gan's (1991) nomograph. To achieve a small nominal ARL of approximately 17 h 

is set a 0.25 and k to 1. 7. Whenever a signal is given. this implies an out of control 

situation, that is the estimate of the player's natural ability has changed. If a cause 

for the alarm is found the CUSUM is reset to zero. 

-4,,~ 

~·-- ......... 
I 

'--·--· .... -- .. --------------- ... - ............. ___ _,_ .... L-................. 1 -2.2E-01 

E' -o3 -l a ! 

I 
-C•.6 _ __j I ~'.of,,·rCUSU!vi 

I 

0 10 

Innings 

Figure 12. CUSUM Control Char1 of M.J Home's 
Transformed Batting ContributJOn. 


CHAPTER 3. Monitoring Player Performance 65 

A signal is given at point 19. once again corresponding to a score of zero. Apart 

from the one 'duck', no other signals are given , indicating that Horne is performing 

to expectations in terms of contributing to the team total. 

0.2 -

0.1 -
E 
:::::i 

Cf) 0.0 -

~ 
"+;j 
co 

-0.1 -
:::::i 
E -0.2 -
a 

-0.3 --
-0.4 -

Upper CU SUM 

0.185268 

.-.-., tj ~ ~ ....... -. ................. ..-... ................ -

I I I I I I 
1 I I I, I 
I I I I I I 

11 11 11 

f t ~ 

-1 .9E-01 

lower CUSUM 

I I I I 

0 10 20 

Innings 

Figure 13. CUSUM Control Chart of B. R. Hartland's 
Transformed Batting Contribution. 

A signal is given at each point Hartland failed to score. This is the 

same as the Shewhart chart with only the 2-sigma lim its in operation. It is not 

necessarily confirmation of deterioration in form. However, with three alarms in a 

short space of t ime. this suggests that the true estimate of mean batting ability for 

Blair Hartland is actually significantly less than noted. 


CHAPTER 3. Monitoring Player Performance 66 

3.3.3 EWMA 

An alternative to the Shewhart control chart is the Exponentially Weighted Moving 

Average control chart developed by Roberts (1959). The performance of the 

EWMA chart is similar to the CUSUM scheme, but easier to set up and operate 

(Montgomery, 1997). The EWMA chart proves to be useful when it is not practical 

to take more than a single observation per sample, as is the situation presented by 

cricket. An advantage is the effect of averaging to detect process level shifts and 

damping out some effect of random errors on individual observations, due to the 

reliance on past observations. 

The exponentially weighted moving average is defined as follows: 

Z j = AXi + (1-A.) Zi- 1 

(Montgomery, 1997) 

Where A. is a constant greater than zero , but no larger than one. The starting 

value (when time , i, is one) is the process target and hence equal to the population 

mean . The C(lntrol limits are defined a.s follows 

UCL =-0 Pu+ Lu t=-~~-=·[~-~~~~1)2; ) 
\(7..-)~) 

1----;----·-- --- ~-

I ,.,l. - L-· 1 il r1 _c· ·- ')2;] Jc , - u,, - l) L... i J. 
I v '(2-A) 

CL= 11 ~1 

The factor L represents the width of the control limits. A. indicates the weighting 

placed on previous values. To obtain an appropriate ARL, of approximately 17, 

values are taken from Crowder's (1989) nomograph to detect a shift of one 

standard deviation . Land A. are set at 2 and 0.25 respectively. 


CHAPTER 3. Monitoring Player Performance 

2.8 

s.J-- 2o&.=2.705 

~;v\\ 23 

~ ~ /\ -
~ 

-~ ' VJ \ ><=2054 
w 

18 

1..-,_ 
-2.09-.= 1 402 

13 

0 5 10 15 20 25 

Innings 

Figure 14. EWMA Control Chart of M.J. Horne 's Transformed Batting 
Contribution With Zone Run Rules. 

67 

As no signals are given, further confirmation that Horne is performing to 

expectations in terms of contributing to the team total is found in the above EWMA 

chart. 

2. 09.,:2. 222 

= 
X:l €61 

-20$..:1101 

10 20 30 

Innings 

Figure 15. EWMA Control Chart of B.R. Hartland's Transformed Batting 
Contribution With Zone Run Rules. 

In contrast to the Shewhart and CUSUM scheme, the EWMA chart only signals 

after the third zero score, suggesting that there is no significant change in form. As 

the EWMA scheme is not readily influenced by the zero scores, this is the 

preferred option, for assessing batting contribution. 


CHAPTER3. Monitoring Player Performance 68 

3.4 Proposed Control Chart Based on Quartiles. 

The performance of a control chart in the traditional sense is very sensitive to the 

assumptions when dealing with individual scores due to the relatively high 

likelihood of failure. When the assumptions of either independence or normality 

fail, traditional methods introduce high probabilities for false alarms (Type I errors) 

(Vasilopoulos. Stamboulis, 1978). Problems arise in that the distribution tor 

individual batting scores is mixed geometric. as indicated in Chapter 2, and thus 

transformation will not yield an approximately normal distribution. However, we are 

wanting to preserve the influence of extreme scores as these are an indication of 

an individual's capability, which non-parametric methods negate. A potential 

problem with standard control charts is that a single extreme outlier can trigger an 

out-of-control situation. We want a technique that is devoid of a distribution, or 

enhances, or is adaptable to a given distribution, and makes use of extreme values 

for individual observations. An approach is introduced here based around the 

simple concept of quartiles. 

The proposed method is based on Quartiles. As sample sizes are generally small, 

theoretical quartiles are used rather than the observed values, which also allows 

parametric influences. To maintain the attachment to a given distribution, the 

theoretical quartiles are gained using the estimate of tile mean. Hence, outliers 

potentially influence these values and as a result information pertaining to outliers 

is not lost. How these values were obtained is discussed later. Essentially, due to 

the nature of the 'ducks and runs' distribution, specifically the number of zero's 

occurring. the use of a shewhart type scheme is inappropriate. Moreover, the LCL 

is incompatible with the number of zeros that are generated, effectively causing an 

alarm every time a zero is recorded. 

From Chapter 2 there is sufficient evidence to imply that individual batting scores 

are from a mixed geometric distribution. As this distribution is discrete, to find the 

theoretical quartiles involved investigating the cumulative probab1l1t1es. Values for 

U1e theoretical quartiles were computed based on the mean score and mean 

contribution. 


CHAPTER 3. Monitoring Player Performance 

Using EXCEL a table was produced listing the probability of a given score, given 

varying batting parameters, mean score and mean contribution. A scatterplot of 

the mean contribution and mean score revealed a linear relationship between 

score and contribution. Consequently a 99% Prediction interval, shown below, 

from a simple linear regression gives the most likely region of interest, giving 

general bounds on which to form the table. 

69 

The spreadsheet was designed to sum al l previous values. Where the cumulative 

probability was closest to the values of the quartiles (0.25, 0.5, 0.75), the 

corresponding score was taken. Invariably all the values fell between the positive 

integer values. As individual scores are represented by only positive integers and 

zero, the precise score at which they were obtained is not needed, so values given 

are of the form (m+n)/2 where m and n are the two immediately neighbouring 

points that surround the quartile value of interest. 

70 

60 / 
/ 

50 
, .,,,,,,., 

@ 40 

,,...., .. . 

~ / ,,,,,. .. 
30 ·.,/" c ,,,,,. .. 

cu ./ 
~ 

20 ,,,,,. 
10 / 

/ Reg ression 
0 

/ 
.. 

99% Pl .. 
-10 

,,,,,.,. 

0 10 20 

Mean Contribution 

Figure 16. Fitted Line Plot of Mean Contribution Vs Mean Score 


CHAPTER 3. Monitoring Player Performance 

Three control !ines are drawn corresponding lo the lower quartile, median and 

upper quartile. These theoretical ·,·a!ues are observed from tables !isted in 

Appe1·,dix t:=., 'Jtiiising the averagl"' score of tt1e no:i-zero component and mean 

coritributicn. F1& exac: :easoning behind this is given in the development of the 

mixoc' rnodol desc:ioed Radie'. The mean score and meaci -~ontribution are then 

1c!a0ted tr:"'· oeing dis;ricut:e;r, l·ee to be;ng based upon mixed geometric. 

To signal an alarm rules are developed based on the probabilities of a point 

appearing in a certain zone or pattern of zones which can be used to identify 

performance. These rules try to emulate the zone rules that supplement the 

Shewhart charts. 

70 

Due to the nature of cricket, it takes a number of observations to be able to 

effectively estimate a player's ability. It is logical to also infer this holds for a 

change rn ability. The quartile chart with zone rules described below will give a 

false signal on average every 16.8 innings. If five rounds of Shell Trophy mar.ch0s 

(five matches. ten innings) are played, this equates to approximately one signai ill 

two seasons. For an out of control situation, either through ioss of form or an 

improvement a signal wil! !JG picked up approximateiy 1n one season. The run 

:1 des de~)gr~Eid fur ust; \'\ii!h this cl 1~1rt are set at 3 95°/; confidence !!rrlit and are 

ref:.::11.l-Jf"':Y' ~;1c;:-:;e ,~.!torinfJ th~s to a sf:ric+(~f plaJ1 98.5°1~ VJOuld correspond to !arger 


CHAPTER 3. Monitoring Player Performance 71 

The figure below details the positioning of the zones about the quartiles. 

ZONE 4 
Upper Quartile 

ZONE 3 
Median 

ZONE 2 
Lower Quartile 

ZONE1 

Observation 

Figure 17. Quartile Control Chart 

In creating a set of zone rules for the Quartile chart it was attempted to have run 

rules com_parable to those implemented by the Shewhart Control Chart. Obviously 

shewhart charts assume normality, which is not found in this situation. 

A situation akin to a uniform distribution is created . Four zones with equal 

probability (0 .25) are created based upon the Median , Upper Quartile and lower 

Quartile . Knowing the probability that a point falls into a certain zone enables 

improbable run lengths to be established. That is certain patterns of data - similar 

to that used in the zone rules for Shewhart Control Charts - that are unlikely to 

occur in an in-control situation can be established. The run lengths are given 

where the probability for a given string of values drops below a given error setting . 

In this instance run lengths are given for an error setting of 0.05. That is there is 

less than a 95% chance of a given pattern occurring. This setting is chosen as it is 

sufficiently tight for the context presented . 6 rules are proposed as follows. 


CHAPTER 3, Monitoring Player Performance 

Zone Rules for Quartile Chart 

An 'alarm' in this instance refers to a change in form signal. 

H0 : Playing to natural ability 

1 Runs in Extremities 

Involves a run of points exclusively in zones 1 and 4, 

72 

As there is a 05 probability of a score occurring in these zones, under Ho an alarm 

will be sounded after 5 points 

P[False Alarm]= 0.5><0,5x0.5x0,5x0.5 = 0 03125 

In terms of a player's performance this signal suggests that a player is either failing 

or going on to make a big total, relative to the estimate of their natural ability, 

2 Runs in Central Zones 

Similarly. this involves a run of points exclusively in zones 2 and 3, 

As there is a 0,5 probability of a score occurring in these zones, under Ho an alarm 

will be sounded after 5 points 

P[False Alarm]= 0.5x0.5x05x0,5<0.5 = 0,03125 

In terms of a player's performance this signal suggests that a player is getting 

star1s but not going Oil to make a big total, relative to the estimate of their natural 

ability 

3 Runs in one Zone 

As there is a 0.25 chance of falling in any give zone under H0 

P[False Alarm]= 0.25>0.25x0.25 = 0,015625 

3 runs 1n one zone results in an alarm. Depending on which zone the alarm 

originates. influences the interpretation of the signal An alarm from zone 4 

indicates an extremely good senes of scores, whereas 3 points in zone 1 shows 

poor form, 


CHAPTER 3. Monitoring Player Performance 73 

4 Runs above or below the median 

This involves a series of scores in either Zones 1 and 2 or in Zones 3 and 4. 

As there is a 0.5 probability of a score occurring in these zones, under H0 an alarm 

will be sounded after 5 points 

P[False Alarm]= 0.5x0.5x0.5x0.5x0.5 = 0.03125 

Again depending on which region has caused the alarm sways the interpretation. 

Runs above the median are highly favourable and thus indicate good form. 

Conversely a signal below the median indicates inadequate performance. 

5 Points increasing /decreasmg 

s 
c 
0 

r 
e • P1 

Time 

Figure 18. Establishing Number of Consecutive Increasing Points for Alarm 

The number of strictly increasing points can be found by solving an integral of the 

following form such that the resultant probability is below 0.05 . 

This is because for the sequence to be increasing P1 must lie in the 

range (0. P2); P2 must fall in the range (O, P3) and so forth. Solving the above 

integral yields a probability of 0.0417 (4d.p.) . the first sequence (4 consecutive 

points) to fall below the defined 5% Type I error. 

A similar situation exists for points strictly decreasing except the integration takes 

place over the ranges (pn, 1) instead of (0, Pn) . 

Therefore for increasing or decreasing points P[False Alarm] < 0.05 occurs after 4 

points (0.0417). If the points are increasing th is suggests a player is improving, 

conversely a decrease suggests poor form.