Copyright is owned by the Author of the thesis.  Permission is given for 
a copy to be downloaded by an individual for the purpose of research and 
private study only.  The thesis may not be reproduced elsewhere without 
the permission of the Author. 
 

Mas$ev lJri'.versity Library 

New Zealand & Pacific Collection 

AN EXPLORATORY STUDY OF 

FINAL GRADES AWARDED TO BACHELOR WITH 

HONOURS AND MASTERS STUDENTS 

A thesis presented in partial fulfilment of the 

requirements for the degree of Master of Arts 

in Psychology at Massey University. 

Patricia Bolger 

1990 


ii 

ABSTRACT 

This study explores the final grades awarded to Bachelor with honours 

and Masters students in New Zealand universities from 1960 to 1989 as 

a function of students' gender, the university attended, the degree 

completed, and the subject studied. These grades were also compared 

with the grades awarded to Bachelor with honours students in England 

and Wales from 1974 to 1989. Chi-square test statistics were used to 

measure the significance of these relationships. In New Zealand women 

were awarded significantly more first class degrees than men. In England 

and Wales men were awarded significantly more first class degrees than 

women. Science students were awarded a higher percentage of first class 

degrees than other students in both New Zealand and England and Wales. 

In New Zealand Bachelor with honours students were awarded first class 

degrees more frequently than Masters students. Political and historical 

developments, the nature of the grading procedures used, and 

institutional and departmental variance provide partial explanation for 

some of the results. It is clear that no single factor is responsible for 

these variations in degree performance, but rather a complex interaction 

of several factors. It is concluded that in New Zealand and England and 

Wales, gender, university, the degree undertaken, and the subject 

studied, all have an effect on the final grade a student is awarded. 


iii 

ACKNOWLEDGEMENTS 

I would like to thank Mike Smith, my supervisor, for his encouragement, 

assistance, and practical research philosophy. 

Thanks also to the New Zealand University Students Association for 

awarding me their Scholarship for Higher Education. Most importantly this 

strengthened my own belief in the value of this research. 

Thanks to Robert Loeffen , Ali Maginness, Joss Tennent, Maria Bolger , and 

especially Andrew Kibblewhite for their continual support, advice, and 

friendship. 

Lastly, Mum and Dad, thanks for the genes and the environment, without 

which I could never have come this far. Also thanks for the unending 

support and friendship. 


TABLE OF CONTENTS 

Page 

Abstract ii 

Acknowledgment iii 

CHAPTER ONE - OVERVIEW 1 

CHAPTER TWO - PERFORMANCE APPRAISAL 4 

2.1 Introduction 4 

2 .2 The Criterion 5 

2.3 Assessment Methods 7 

2.4 Types of Data 8 

2.5 Rating Scales 9 

2.6 Rating Error 10 

2.7 Rater Train ing 12 

2.8 The Process Model 13 

2.9 Performance Appraisal within Education 1 5 

CHAPTER THREE - ASSESSMENT IN POSTGRADUATE EDUCATION 17 

3.1 To Grade or not to Grade 17 

3 .2 Assessment Methods 19 

3.3 Assessment Reliability 20 

3.4 Biases in Assessment 22 

3.5 Sex Bias 23 

3.6 Conclusion 25 

CHAPTER FOUR - THE UNIVERSITY SYSTEMS 26 

4. 1 Universities - Their Purpose 

4.2 New Zealand Universities - The Beginnings 

4.3 The Present New Zealand University System 

4.4 The University System of England and Wales 

4.5 Standards in the British University System 

4.6 The British External Examination System 

CHAPTER FIVE - HONOURS STUDIES 

5. 1 Introduction 

26 

27 

28 

29 

31 

33 

35 

35 


5.2 Gender Studies 35 

5.3 Subject Studies 38 

5.4 T he Student Population 40 

5.5 Institutional Differences 42 

5 .6 T he Present Study - Part A 45 

5.7 T he Present Study - Part B 48 

HYPOTHESES 49 

CHAPTER SIX - THE METHOD - PART A 50 

6.1 Subjects 50 

6.2 Procedure 51 

6.3 New Zealand Analyses 51 

6.3.1 Step one - Univariate analysis 52 

6.3 .2 Step two - Crosstabulation of degree and gender 52 

6.3.3 Step three - Changes in the sample over time 52 

6.3.4 Step four - Changes in subject areas over time 53 

6.3.5 Step f ive - The distribution of grades 54 

6.3.6 Step six - The effect of gender and subject on grades 54 

6.3. 7 Step seven - The distribution of first class honours 54 

6.3.8 Step eight - Institutional difference in grades 55 

THE METHOD - PART B 55 

6.4 Subjects 55 

6 .5 Procedure 56 

6.6 England and Wales Ana lyses 56 

6.6.1 Step one - Gender differences in choice of subject area 56 

6 .6.2 Step two - The distribution of grades 56 

6. 6.3 Step three - The effect of gender and subject on grades 56 

6.6.4 Step four - A comparison of New Zealand and England and 57 
Wales grades 

6.6.5 Step five - A comparison of the subject areas studied in 57 
New Zealand and England and Wales 

CHAPTER SEVEN - RESULTS - PART A 58 

7 .1 New Zealand Ana lyses 58 

7 . 1 .1 Step one - University attended 58 

7. 1 .2 Step two - Crosstabulation of degree and gender 59 

7. 1 .3 Step three - Changes in the sample over time 59 


7. 1 .4 Step four - Changes in subject areas over time 60 

7. 1 .5 Step five - The distribution of grades 65 

7 .1.6 Step six - The effect of gender and subject on grades 67 

7 . 1. 7 Step seven - The distribution of first class honours 71 

7 .1.8 Step eight - Instit utional differences in grades 73 

TH E RE SULTS - PART B 74 

7 .2 England and Wales Analyses 74 

7.2.1 Step one - Gender differences in choice of subject 74 

7 .2.2 Step two - The distribution of grades 76 

7 .2.3 Step t hree - The effect of gender and subject on grades 78 

7 .2.4 Step four - A comparison of New Zealand and England and 78 
Wales grades 

7 .2.5 Step five - A comparison of subject areas studied in 78 
New Zealand and England 

CHAPTER EIGHT - DISCUSSION 79 

8.1 Introduction 79 

8.2 Characteristics of the Postgraduate Population 79 

8 .2.1 Gender and Degree 79 

8 .2.2 Changes in Subjects Studied 80 

8 .3 Grading Issues 82 

8.4 Differences in Gender and Grades 84 

8 . 5 Grade Differences in Subjects 91 

8 .6 Compar ison Between Grade Dist ribut ions of New Zealand 96 
and England and Wales 

8 . 7 Difference in Grades between Bachelor with honours 101 
and Masters Degrees 

8.8 Institutional Differences 104 

App en dices 

One - Variab le Codes 

T wo - Study Categories 

References 

108 

120 

132 


LIST OF FIGURES 

Page 

Figure 2.1: Cognitive Components in Rating . 12 

Figure 2.2: The Process Model of Performance Rating . 14 

Figure 2.3: An illustration of the similarities between Education 15 

and the Workplace in Judgemental Ratings. 

Figure 7. 1: The distribution across New Zealand 's universities of 58 

students who completed a Bachelor with honours or 

Masters degree during 1960 to 1989. 

Figure 7 . 2 : The change in distribution of New Zealand students 60 

who have completed a Bachelor with honours or 

Masters degree between 1960 to 1989. 

Figure 7 .3: The percentage of New Zealand students who have 62 

completed a Bachelor with honours or Masters degree 

in each subject area during the Sixties, Seventies and 

Eighties. 

Figure 7 .4: The percentage of New Zealand Male students who 63 

have completed a Bachelor with honours or Masters 

degree in each subject area during the Sixties, 

Seventies and Eighties . 

Figure 7. 5: The percentage of New Zealand Females students 64 

who have completed a Bachelor with honours or 

Masters degree in each subject area during the 

Sixties, Seventies and Eighties. 


Figure 7 .6: The proportion of each class of honours awarded to 66 

Bachelor with honours and Masters students of New 

Zealand. 

Figure 7. 7 : The proportion of each class of honours awarded to 66 

Masters students of New Zealand. 

Figure 7 .8: The proportion of each class of honours awarded to 67 

Bachelor with honours students of New Zealand. 

Figure 7 .9: The percentage of Bachelor with honours and Masters 73 

students who were awarded a first class honours 

degree at each New Zealand university . 

Figure 7 .10: The distribution of students who completed a 75 

Bachelor with honours degree in England or Wales in 

each subject area between 1974 to 1989 by Gender 

and Total. 

Figure 7 .11: The proportion of each class of honours awarded to 76 

Bachelor with honours students at England or Wales 

universities. 


LIST OF TABLES 

Page 

Table 7 .1: The Gender and Degree composition of the sample. 59 

Table 7 .2: The proportion of New Zealand students who studied 68 

each subject area as a function of Gender and class 

of honours received. 

Table 7 .3: The proportion of New Zealand Masters students who 69 

studied each subject area as a function of Gender and 

class of honours received. 

Table 7 .4: The proportion of New Zealand Bachelor with honours 70 

students who studied each subject area as a funct ion 

of Gender and class of honours received. 

Table 7 .5: Chi-square results of the proportion of first class 72 

honours degrees awarded to Males and Females who 

completed a Masters in each subject area. 

Table 7 .6: Chi-square results of the proportion of first class 72 

honours degrees awarded to Males and Females who 

completed a Bachelor with honours in each subject 

area. 

Table 7. 7: The proportion of England and Wales students who 77 

studied each subject area as a function of Gender and 

class of honours received. 


CHAPTER ONE 

OVERVIEW 

1 

The degree class awarded to a student is an important marker of 

achievement. Yet the reliability of assessment in higher education has 

been the subject of concern for some years (Hartog & Rhodes, 1935; Dale, 

1959; Cox, 1967; Foster, 1985; Johnson, 1988). Research continues to 

highlight discrepancies in the grades that students receive that are not 

the result of differences in students academic ability. Differences have 

been noted in the awarding of honours degrees between institutions (Bee 

& Dolton, 1985; Connolly & Smith, 1986; Johnes & Taylor, 1987), 

between courses of study (Bourner & Bourner, 1985; Smith, 1990), and 

between males and females (Rudd, 1984; Kornbrot, 1987; Clarke, 1988). 

Further there is still no uniform opinion as to why these differences occur. 

Answers to these questions are likely to be of interest not only to the 

universities themselves, but also to potential university students and to 

employers. Potential students are likely to be interested in discovering 

the extent to which their chances of obtaining a "good" degree might vary 

between institutions and departments. Employers may be interested to 

know where they are most likely to recruit graduates with "good" degrees . 

It is the purpose of this research to investigate whether degree results 

vary between institutions, the subject studied, and between males and 

females who have completed postgraduate degrees in New Zealand in the 

last thirty years. New Zealand grades will also be compared with those 

of England and Wales. 

Grading is a form of performance appraisal, and as such a great deal of 

the research in this area is applicable to grading and assessment within 

education. Chapter two is an overview of performance appraisal. Nearly 


iii 

ACKNOWLEDGEMENTS 

I would like to thank Mike Smith, my supervisor, for his encouragement, 

assistance, and practical research philosophy. 

Thanks also to the New Zealand University Students Association for 

awarding me their Scholarship for Higher Education. Most importantly this 

strengthened my own belief in the value of this research. 

Thanks to Robert Loeffen , Ali Maginness, Joss Tennent, Maria Bolger , and 

especially Andrew Kibblewhite for their continual support, advice, and 

friendship. 

Lastly, Mum and Dad, thanks for the genes and the environment, without 

which I could never have come this far. Also thanks for the unending 

support and friendship. 


3 

In chapter eight the results are interpreted, and some explanations for the 

outcomes observed are provided. Contrasts and similarities between the 

results of New Zealand's universities and those of England and Wales are 

examined. The implications of these results for postgraduate students 

from both New Zealand and England and Wales are discussed, along with 

suggestions for future research. 


2. 1 Introduction 

CHAPTER TWO 

PERFORMANCE APPRAISAL 

4 

There is no escape from pe rformance appraisal. It is impossib le to go 

through life without being assessed many times, in many different 

situations, for many different purposes . Assessments are sometimes 

forma l as in job interviews or teachers' reports, though they are just as 

often informal, such as meeting new acquaintances and judgements made 

by school peers. We are all assessed virtually from birth, and then 

continually throughout our life, be it by doctors, school teachers, and 

family, and later, by the bank manager , lecturers , and the sports coach. 

We assess people who provide us with services such as sol icitors, che fs, 

and hairdressers and act on our judgement of their effectiveness to decide 

whether we will continue to use their services. 

To appraise anything is to set a value on it. The purpose is to find out 

how a person performs when compared with a standard . The most 

common and formal type of performance appraisal takes place in the work 

setting. 

Formal performance appraisal systems are constructed with the 

understanding that performance evaluations represent meaningfu l 

distinctions among individuals that correspond to actual behavioura l 

differences (Wendelken & Inn, 1981 ). The overall aim of the appraisal is 

to remove the influence of extraneous factors from the evaluation process 

in order to focus solely on aspects of performance that are related to 

some specific c rite rion. 


5 

Although judgements may be made about an individual's performance on 

a regular basis, the accuracy and equity of this process is still unresolved. 

Organizations continue to express disappointment in performance 

appraisal systems despite advances in technology (Banks & Murphy, 

198 5). It should be appreciated that even with the best intentions, it is 

unlikely that performance appraisals can ever be made completely 

objective and accurate. Issues such as validity, reliability , and bias 

remain major and persistent problems which often hinder or nullify the 

value of many performance appraisal systems . 

2.2 The Criterion 

Before a performance appraisal can be conducted it is essential that an 

organization determines the nature of the dimensions on which 

distinctions about performance are to be made. This is referred to as 

criterion development. The criterion is a way of describing success. For 

example, the criterion for a shop retailer might be the monetary value of 

sales in a one-month period. A criterion for measuring a student' s 

success in a school subject might be the course grade. The criterion for 

measuring a dieter's success is most likely to be the amount of weight 

lost . However, defining "the criterion" is not always a simple matter. It 

has been a problematic area of Industrial/Organizational Psychology for 

many years (Landy & Farr, 1983). 

For this reason, no doubt, a large amount of research has been directed 

at determining the necessary "criteria for criteria". Blum and Naylor 

(1968), for example, compiled a list of fifteen characteristics they 

considered necessary and/or desirable for criteria. These undefined 

characteristics are as follows: reliable, realistic, representative, related 

to other criteria, acceptable to job analysts, acceptable to management, 

consistent from one situation to another, predictable, inexpensive, 


6 

understandable, measurable, relevant, uncontaminated, bias free, and 

discriminating. Unfortunately, there have been few attempts to refine 

these characteristics or develop operational definitions of the criteria for 

the criterion. Subsequently a numerous array of variables have been used 

to study the effectiveness of performance appraisal data. 

This inconsistency in criterion development places doubt on the use of 

some performance appraisal measures. Downey, Lahey and Saal (1982) 

have shown empirically that the operational definitions adopted for 

criteria will significantly affect the conclusions drawn in the assessment 

of appraisal data. In a comparison of the psychometric characteristics of 

ratings from graphic and mixed-standard scales, it was found that the use 

of one set of operational definitions for rating error produced results that 

differed from the results obtained when another set of operational 

definitions was adopted. Their study illustrates the need for researchers 

and practitioners to thoroughly scrutinize the criteria they select for 

assessing appraisal data. This is best achieved by considering several of 

the essential requirements for a criterion. 

The first requirement of a criterion is that it be relevant to some important 

goal of the individual, the organisation, or society (Smith, 1983). 

Determination of relevance, is however, a matter of judgement. Some 

group or person must decide which activities are most relevant to 

success. Once these activities have been identified, effort must then be 

directed towards developing psychometrically sound measures of these 

activities. The measure of a criterion should be, neither contaminated 

with irrelevant variance, nor deficient in terms of measuring the important 

objectives of the organisation and of the people in it. As well, neither the 

criterion nor the measure of it should be biased or trivial. Relevance, 

consists of two parts. One is the validity of the goal which is judged to 

be important. The second is the validity of the measure(s) of goal 


7 

achievement. This requirement is parallel to the requirement that a test 

be valid. 

Reliability is the second requirement of a criterion. The estimates of 

reliability may be grouped into three general classes : (a) measures of 

stability; (b) measures of equivalence; and (c) measures of internal 

consistency (Landy & Farr, 1983). A criterion measure must in addition 

be practical, available, plausible, and acceptable to those who use it 

(Smith, 1983). Once an appropriate criterion has been determined, a 

method of measurement needs to be chosen. Frequently referred to as a 

performance measure or assessment method. 

2.3 Assessment Methods 

The techniques used to assess and measure performance can be grouped 

into three general categories: comparative , absolute, and outcome or 

results-orientated. (Landy & Farr, 1983; Long, 1986). Comparative 

techniques evaluate the performance of employees in a work group 

relative to each other, using paired comparison, a ranking procedure, or 

forced distribution. All procedures are highly subjective as the rater is 

given a great deal of latitude to infer what distinguishes levels of effective 

performance. 

Absolute or criteria referenced methods attempt to describe or evaluate 

the performance of an individual by reference to some standard or 

standards of performance, not in relation to other individuals. Techniques 

include the essay or narrative-type approach, graphic or trait rating 

scales, mixed standard rating scale (Blanz & Ghiselli, 1972), checklists, 

critical incidents (Flanagan, 1954), and behavioral anchored rating scales 

(Smith & Kendall, 1963). All these procedures have limitations, but they 

may be appropriate depending on the purpose for which the appraisal is 

conducted. 


8 

The final group of methods is those that are results-orientated. These 

methods concentrate on specific accomplishments and outcomes 

achieved as a result of job performance, rather than job behaviours. 

Central to this approach is employee participation, objectives being jointly 

agreed between superiors and subordinates, and standards established in 

advance as the result of discussion and negotiation (Long, 1986). One 

problem with this approach is that a high degree of inferential skills, 

management time and effort is required for the method to work 

effectively. This method has also been found to be unsatisfactory for 

complex positions (Gruenfield, 1981 ). 

2.4 Types of Data 

In conjunction with the method to be used in a performance appraisal, 

the type of data to be collected needs to be determined. There are several 

kinds of data that can be used to provide the necessary information. 

Guion (1965) identified at least three different types of measures of job 

behaviour: objective data, personnel data, and judgmental data. Ideally, 

Landy (1989) states that a complete performance measurement should 

include a combination of all three of these indices of performance, as the 

multi-dimensionality of job performance only becomes apparent when 

these categories are considered simultaneously. This advice is rarely put 

into practice. 

Both objective data and personnel data can be problematic. Generally the 

recording of this information is either not done correctly, or is unable to 

be done adequately enough for the resulting information to be useful, 

valid, or reliable. This does not imply that objective or personnel data 

have no value as criteria, but rather, that if they are to be useful, a careful 

analysis of the relationship between the elements of the job as identified 


9 

by the job analysis and elements of the behaviour as related to 

performance appraisal is necessary. 

Judgmental data is the most frequently used form of measurement. 

Landy ( 1989) reported that a literature review of validation studies in the 

Journal of Applied Psychology between 1965 and 1975 revealed that 

ratings were used as the primary criterion in 7 2 % of the cases. These 

judgments can take several forms. They may be a simple comparison of 

one employee with another, a list of statements which are applied to each 

employee, or some form of rating by which the employee is placed on a 

continuum depending on their level of proficiency. 

2.5 Rating Scales 

The most widely used performance appraisal method is the judgemental 

rating scale (Long, 1986; Baker, 1988; Leap & Crino, 1989). Rating 

scales can be distinguished from one another on three different 

dimensions (Guion, 1965). The first dimension is the degree to which 

the meaning of the response category is defined. This deals with how 

the rating scale is marked off into units. Here a number of important 

decisions need to be made, the first is how many points the scale should 

contain. Previous research on the use of rating scales indicates that the 

optimal scale should include four or five points. Reliability drops with 

three categories or less, and there is little increase in reliability when 

there is more than five points (Lissitz & Green, 1975). Further , when 

deciding on the number of scale points, the organisation must decide 

whether they wish to permit central, uncertain, or undecided responses , 

which can occur with an odd-numbered scale (Jacobs, 1986). 

The second dimension is the degree to which the person interpreting the 

scale can tell what response was intended by the ratee. This is referred 

to as response clarity and is largely determined by the structure of the 


10 

scale. The third dimension is the degree to which the performance 

dimension being rated is defined for the rater. Whenever possible, verbal 

descriptions should accompany the numerical scale (Jacobs, 1986). Scale 

anchors that are defined precisely are less open to misinterpretation and 

therefore give the rater a reasonable idea of what performance dimensions 

are being considered. It has also been suggested that points on a rating 

scale should be reviewed between raters to ensure that there is 

agreement concerning what each point means in terms of actual 

performance behaviour (Leap & Crino, 1989). 

2 .6 Rating Error 

In spite of the different forms and widespread use of judgmental indices 

of performance there has been a consistent dissatisfaction with these 

measures on the part of the researcher and practitioner. This 

dissatisfaction can be largely attributed to three types of rating errors -

halo, central tendency, and leniency (Anastasi, 1 982). Other errors also 

contribute to the contamination of performance ratings, they include 

contrast, first impression, and spillover effect. 

Halo error, named by Thorndike ( 1920), occurs when a rater has a 

generally favourable or unfavourable impression of the person rated. This 

influences the rater in such a way that ratings are assigned which are 

consistent with that impression. The effect of this psychometric error is 

most exaggerated when multi-factor ratings are required. No method has 

been devised that effectively eliminates halo errors, and research on 

alternative solutions still continues (King, Hunter & Schmitt, 1980; 

Landy, Vance, Barnes-Farrell & Steele, 1980). 

The tendency to overweight in an appraisal any information and/or 

observations made on a person early in the appraisal period, is labelled 

first impression error (Latham, Wexley & Pursell, 1975). This 


1 1 

judgemental error is thought to be related to halo in that first impressions 

may facil itate or in fact be synonymous with the development of a 

positive or negative halo impression about a person. 

Other rating errors are the result of inappropriate rating patterns, due to 

the rater ' s failure to make necessary and appropriate distinctions among 

the performance levels of different individuals (Leap & Crino , 1989) . They 

include central tendency error which is generally defined as the "bunching 

up" of ratings at or near the middle of the scale owing to raters 

unwil lingness to assign extreme ratings. Since many indiv iduals do 

perform somewhere around an average , it is an eas ily rationalised escape 

from making a valid appraisal (Henderson, 1984). 

Leniency error , refers to raters who are unusually harsh or unusually easy 

in their rat ings. This results in ratings being bunched up either towards 

the lower or upper end of the scale. Both leniency and central tendency 

errors reduce the effective width of the scale and make ratings less 

discriminatory (Anastasi , 1982). 

One suggestion that has been offered to eliminate these errors is a forced 

distribution in which the rater is required to allocate a given percentage 

of the ratees ' to each category (Landy & Farr, 1983). However, the forced 

distribution assumes that there is some knowledge of what the 

distribution should look like, in most circumstances this assumption is 

probably untenable. 

Better scale development has also been suggested as one means of 

reducing leniency error . In particular, reducing the ambiguity of the scale 

by improving the definitions of the dimensions of the scale. Finally, 

attempts to eliminate rater errors have often focused on training raters 

to be aware of these tendencies. 


12 

2. 7 Rater Training 

Skill is required to appraise performance, therefore it makes sense to train 

raters. As Allinson (1977) observed, if an appraisal system is to function 

effectively, all members of an organisation should be educated about how 

to use the particular rating form, and its purpose . There has been a 

considerable amount of research on rater training (see reviews by Spool, 

1978; Bernardin & Buckley, 1981 ). Traditionally, an important part of 

rater training has involved a description of the traditional rating errors of 

leniency and halo and suggestions on how to avoid them (Landy, 1989). 

However, Bernardin and Pence's (1980) study found that raters who were 

only trained in avoiding rating errors adopted a "set" or a cognitive control 

mechanism that was geared towards producing ratings that had certain 

statistical properties rather than ratings that described actual behaviour. 

A more effective rater training method proposed by Murphy, Martin and 

Garcia (1982), is to train raters in observation skills rather than in the 

ways of avoiding rater error. Their proposal is based on the assumption 

that raters who are more accurate observers are also more accurate 

evaluators. 

Stimulus Preprocessing 
Instructions 
Training 
Prior Exposure to 
Rating Form 

Observation Stimulus Short-Term Long-Term 
Retrieval 

Synthesis/ - Categorization i-- Memory - Memory i--- ~ Judgement/ 
Rating 

I I 
Figure 2.1: Cognitive Components in Rating. 

Source : Landy & Farr (1980) . 


13 

Accurate evaluation depends on accurate perception. However, even with 

extremely competent raters, it is unlikely that performance appraisals can 

ever be completely accurat e. A major reason for this is that humans have 

limited information processing capabilities {Feldman, 1981). 

Consequently research has focused on why some, as opposed to other, 

appraisal information is attended to; whether information is stored in long 

or short term memory; how information is organised in memory; and how 

it is retrieved and combined for decision making. A model of the 

relationsh ip between these cognitive components is presented in Figure 

2.1. It is an elaboration of the cognitive section {ie. , observation /storage 

and retrieval /judgement) of Landy and Farr's (1980) proc ess m odel. The 

hope is that as the sequence of events that occur in performance appra isal 

becomes better understood, fair and unbiased appraisal systems will 

become commonpl ace {Fe ldman, 1981 ). Until then , rater accuracy may 

be best achieved using some type of regular refresher train ing to sustain 

appra isal skills (lvancevich, 1979) . 

2.8 The Process Model 

The process of performance rating is incredibly complex. There are many 

opportuniti es for ratings to be influenced by factors other than th e 

performance of the person rated. Bernardin and Beatty (1984) have 

suggested that the complexity of performance appraisal is best 

represented by Landy and Farr ' s (1980) process model of performance 

rating. This model, shown here in Figure 2 .2, purports to describe the 

task of performance rating from a process perspective. When considering 

this model it is important to remember that the goal of performance rating 

is to provide an accurate performance description of the person in 

question. In Landy and Farr's ( 1980) model this is represented by the 

box labelled "Performance Description". Most of the other boxes might 


14 

be thought of as "potential obstacles to accurate performance 

description" (Landy, 1989, p. 147). The model attempts to define the 

specific subsystems and their interactive effects that form the larger 

rating system . Each component of the model has a research history which 

may be said to justify its inclusion in the model. 

I 
Position Organiz.ation 
Characteristics Characteristics 

I 
I 

Purpose 
for Rating 

Scale Rater - Development - - Characteristics -- -
Rating I 

Rating Ra tee -
Process I ...... Instrument - - Characteristics ....... 

I I 
Observation Retrieval 

"- " 
Data Performance Personnel 
Analysis Description Action 

Storage Judgement 

Figure 2.2: The Process Model of Performance Rating. 

Source: Landy & Farr ( 1 980). 

Although the model does not offer much in the way of explanation 

concerning why these elements may have adverse effects on the accuracy 


15 

of performance description, it does present a reasonable view of the 

complexity of the process. Note however that the rating process is not 

developed in isolation, it will inevitably be influenced by the purpose of 

the rating and the instruments used for the rating. 

2.9 Performance Appraisal within Education 

The development of a fairer, more appropriate, and consequently better 

performance appraisal system has generally been conducted within the 

realms of the business sector. Nonetheless, performance appraisals are 

conducted in many other facets of life. 

Goal 
Workplace: Education: 

Work performance Aquisition of knowledge 

Conceptual problem of 
whether goal is accurately 
reflected by the criterion 

Criterion 
Workplace: 

Attitude, Quality, 
Output, etc 

Workplace: 
Rating Scale 

Performance is assessed. 
Problems with assessment 
method, procedure, and/or 
assessors may occur. 

Measurement 

Education: 
Essays, Exams 

Labs, etc 

Education: 
Grade 

Figure 2.3: An illustration of the similarities between 
Education and the Workplace in Judgemental Ratings. 


16 

In education performance judgements of students are a regular 

occurrence. They are included under a multitude of other names, such as 

oral examinations, laboratory tests, internal assessment, and final exams. 

It is therefore disappointing that the methods to improve the faults 

highlighted in the performance appraisals' conducted within business, are 

often not used or acknowledged in education (Bee & Dolton, 1985; Foster, 

1985; Heywood, 1989). 

The relationship between the concepts presented previously with respect 

to performance appraisal in the work place and performance appraisal 

conducted within education are graphically presented in Figure 2.3 . In 

postgraduate education an absolute method of performance appraisal, the 

rating scale is used . For honours degrees the final assessment is the class 

of honours received. Problems within the rating scales used to measure 

work performance are just as evident in measures of student assessment. 

The raters are just as susceptible to halo, leniency, central tendency and 

other forms of rater error and bias . Consequently, all of these factors 

can affect the reliability, validity, and accuracy of the performance 

appraisal of student performance. Chapter three looks at issues relevant 

to performance appraisal in postgraduate education. 


CHAPTER THREE 

ASSESSMENT IN POSTGRADUATE EDUCATION 

3. 1 To Grade or not to Grade 

17 

Society usually expects that the education system will select, sift, and 

categorize individuals (Heywood , 1989). Although it is possible for a 

person without a tertiary education to be successful, education is usually 

important for success. Education also aids social development, teaches 

society's values, and strives to instill a quest for knowledge into students 

(Birt, 1985). It is however, the grades that individuals obtain that directly 

affect their future . 

In education grades are used to make judgements about students. They 

are an important standard for admission to higher education. They can 

determine the university attended, and the subject studied at that 

institution. In New Zealand this occurs as a direct result of restrictions 

placed on the number of students that are accepted into some courses. 

Grades can also limit the level to which a person is allowed to progress 

in their chosen subject (Heywood, 1989). 

It is not only within education that grades are important . In the work 

force employers generally believe that grades provide valuable 

information. Consequently for the graduate their final grade can be 

important as a means of obtaining employment, and the remuneration 

package they receive (Dolton & Makepeace, 1990). 

Grades may also be addressed in relation to the performance of 

universities. In line with today's imperatives of relevance, efficiency and 

accountability, universities are now required to look more closely at what 

they are doing and how much it costs to do it. In Britain this is highlighted 


18 

in the recent Green Paper on "The Development of Higher Education into 

the 1990's" (Department of Education and Science, 1985), and in New 

Zealand via several reports on education (Boston, 1988). With the need 

to develop appropriate performance indicators, Smith (1990) noted that 

grades awarded to students are being used to judge performance between 

institutions. 

Some people nevertheless, do not favour the use of grades. They maintain 

that they are ineffective in assessing the value of the educational process 

(Powell & Butterworth, 1971; Wainwright, 1977). Fawthrop (1968) for 

example believes that examinations are a constraint on education: 

"From the educational viewpoint examinations are a supreme 

form of alienation in the modern world. This is also relevant to 

the teaching sphere, in which the genuine aims of the tutor are 

periodically subverted by the exigencies of the system, which 

emphasize that the first obligation is to get them through the 

examination at all costs rather than to stimulate a relevant 

contribution to the advancement of learning. One might well ask 

what does a society profit if it gains a whole world of degrees 

and yet loses its own educational soul? In a world of ignorance 

what can we give in exchange for true knowledge - a million 

scraps of paper certifying student degree status?" (p. 24). 

Most would agree that what Fawthrop believes has some truth. Students 

aiming for high grades and teachers trying to cover the curriculum, can 

result in original thought being dismissed. It is generally acknowledged 

that assessment is not infallible (Cox, 1967; Klug, 1976; Johnson, 1988), 

but most people believe that some form of assessment of students 

abilities must be made. Grades and examinations are as much a part of 

life within education as are the judgements and assessments made by 


19 

people in all other parts of life. It is therefore of the utmost importance 

that examinations and the resulting grades should be both valid and 

reliable . 

3.2 Assessment Methods 

In recent years the methods and timing of assessment of students 

performance have become more diverse. The final exam is still a 

component of most courses. However the majority of university courses 

do not rely on the f inal exam alone. Internal assessment now forms a 

large proportion of a student's final grade (McKay, 1984). Several of the 

methods in which students are assessed before the final exam include, 

essays, tests, oral examinations, seminar presentations, laboratory 

reports, and mastery of practical skills. 

The final exam has also incorporated several new methods of measuring 

acquisition of knowledge. In the United States multiple choice questions 

are now the predominate means of assessing students in their final exams 

(Foster, 1985; Heywood, 1989). Short answers and paragraph answers 

are also in use, as well as the traditional essay questions. The aim is to 

cover a greater range of the curriculum. 

Deciding which method(s) best measure whether students have acquired 

a satisfactory level of knowledge in a given course is a complex task. One 

wants a device that samples the whole range of educational aims, 

provides grades which are fair and stable over time, and which different 

examiners can use consistently. One does not want a technique likely to 

be contaminated by extraneous factors such as a student's sex or name, 

which assesses a limited subset of the educational aims, or which is 

unfairly biased toward some students and against others (Foster, 1985). 

This research considers grading and assessment methods, their value in 

the educational process and their impact on the individual. Teaching 


20 

knowledge alone is no measure of the value of an education an individual 

acquires. "If we are to improve learning, we will have to improve the 

methods of testing and learning we use. They will have to become 

intimately related" (Heywood, 1989, p. 2). 

3.3 Assessment Reliability 

The reliability of assessment in higher education has been the subject of 

concern for some years (see Dale, 1959; Foster, 1985; Silver & Silver, 

1986; Johnson, 1988). It is complicated by the fact that assessment 

procedures differ widely, not only between counties, universities, and 

faculties, but also between departments, and within departments. As a 

result there have been pleas for greater consensus in course objectives, 

and for them to be more explicitly presented (Johnson, 1988). 

For many years psychometric experts have commented that the unseen 

essay examination, the most frequently used assessment technique in 

British higher education (Hewton, 1987), and one of several techniques 

used in New Zealand (McKay, 1984), possesses many disadvantages. 

Students' grades are affected by factors such as the quality of their 

handwriting (Marshall & Powers, 1969), inconsistent marking (Cresswell, 

1986), the exam sampling a very limited section of the student's 

knowledge (Johnson, 1988), and grades being unstable over time (Foster, 

1985). 

In postgraduate education there are two areas where reliability is 

important. The reliability of the measurement device, and the reliability 

between markers in their assessment of an individual. Both have been 

extensively researched because they are fundamental to assessment 

practice (see Cox, 1967; Miller & Parlett, 1974; Bell, 1980; Heywood, 

1989). 


21 

Assessment reliability is also contingent on the particular postgraduate 

qualification undertaken. A Bachelor with honours candidate is partially 

protected against a marked fluctuation in the standard of the papers by 

the number of papers they take , provided that compensation is allowed 

from one paper to another. Similarly, the number of markers is some 

protection against a change in the standard of the marking, and against 

any unreliability of marking (Dale, 1959). 

It has recently been proposed however that the averaging of students 

performance over a number of papers is no guarantee of a reliable 

assessment outcome (Johnson, 1988). This may be the case, particularly 

where the component with the lowest reliability carries a relatively high 

weight in aggregation of marks . For example the thesis in some Masters 

programmes (for example Masters of Arts or Masters of Science) can be 

a large proportion of the final grade. 

It is now over thirty years since Dale (1959) addressed the reliability of 

university grading, and although his work has been cited many times it 

appears that very little has changed. Dale (1959) suggested that the 

biggest obstacle to the reform of unreliable university examinations, was 

the ignorance of the university staff with regards to the pitfalls for the 

examiner. He stated that "the calm assurance with which lecturers and 

professors alike believe that they can carry around in their heads an 

unfailingly correct conception of an absolute standard of the pass line is 

incomprehensible to anyone who has studied the research on the 

reliability of examinations" (p. 186) . 

Research has continually shown that examiners not only differ with one 

another, but that any one examiner will disagree with their own 

assessment of a particular piece of work on a different occasion 

(Cresswell, 1986). Further it is incorrect to assume that examiners share 


22 

implicit notions about standards, and that they consistently allocate 

grades with the same degree of severity or leniency (Johnson, 1988). 

Human error also compounds the problem of marker reliability. Marker 

errors are more easily reduced when the work being assessed is either 

right or wrong, and no subjective judgement needs to enter the decision 

about how to mark a question or examination script. Nonetheless, it is 

well known (Brooks, 1980; Heywood, 1989) that one way to reduce errors 

in marking is to employ more than one marker and take the mean of their 

marks. Another method of improving marker reliability is to have marking 

schemes (Foster, 1985; Johnson, 1988). At the postgraduate level 

however, marking schemes are not always appropriate because of the 

nature of the work assessed. Markers can also be unintentionally biased 

in their marking, just as bias can effect other aspects of higher education. 

3.4 Biases in Assessment 

Higher education is exclusive. In any one year although thousands of 

eighteen-year-olds enter universities to begin courses in higher education, 

many more do not. A small number wish to go but are positively excluded, 

either because they do not reach the minimum level for entry or because 

places are offered to better qualified applicants. A far greater number 

are excluded because their previous education denied them the chance or 

even the ambition to consider higher education (Burgess, 1981 ). 

Most people believe that in admitting people to higher education there is 

no desire to exclude on any but objective academic grounds . If there are 

too few places, then the better qualified will be admitted. 

On examination however the population in higher education is not 

representative of the adult population as a whole. In New Zealand those 

who are successful are overwhelmeningly young, and from this group they 

are predominately white middle-class men (Jones, 1982). Within higher 


23 

education there are effective biases against age (Woodley, 1984), class 

(Williamson, 1981), disability (Sturt, 1881), religion (Gay, 1981), race 

(Little & Robbins, 1981) and sex (Spender, 1981; Acker & Piper, 1984). 

These biases prevent people from entering higher education. In some 

cases they prevent individuals from obtaining an adequate education at 

all. Once a person has entered higher education, there is no guarantee 

that the biases that hampered them from entering these institutions will 

subsequently not affect their grades. 

3.5 Sex Bias 

The evaluation of students' work is supposed to be objective and merit 

based. However the evaluation criteria for assessing students' written 

work are highly ambiguous and the marking process is known to be 

unreliable (Hartog & Rhodes, 1935; Dale , 1959; Robbins, 1963; Cox, 

1967). There is a high level of inference required to evaluate students' 

written work, and therefore it is often stated that biases, including sex 

bias, would be expected to occur under such conditions. 

Most of the studies of sex bias in evaluation have examined the 

hypothesis that when both sexes have identical qualifications or 

performance, men are evaluated more favourably than women. Although 

many studies have demonstrated this pro-male evaluation bias (for 

example, Lao, Upchurch, Corwin & Grossnickle, 1975; Gutlek & Stevens, 

1979; Sharp & Post, 1 980), some studies have found no sex bias (Hall & 

Hall, 1976; Dipboyd & Wiley, 1977; Frank & Drucker, 1977), and others 

have demonstrated a pro-female evaluation bias (Jacobson & Effertz , 

1974; Bigoness, 1976) . Nieva and Gutlek (1980) reviewed the literature 

on sex bias in a variety of situations and suggested that the degree and 

pattern of bias depends on three factors: 


24 

1. Level of Inference: sex bias tends to operate where there is ambiguity 
concerning evaluation criteria. 

2. Sex Role Incongruence: Sex bias tends to occur when the tasks undertaken 
are deemed to be more appropriate for one sex than the other. 

3. Level of Performance: the operation of sex bias appears to be affected by 
the level of qualification or performance involved. 

These factors suggest that the grading in universities could be sex biased. 

Particularly in postgraduate education where evaluation criteria are often 

ambiguous, and in subjects where the essay exam format is prevalent. 

These assumptions can be supported by experimental studies which have 

shown that identical essays get higher marks when a male rather than a 

female name is attached (Wallston & O'Leary, 1981). Women tend to be 

evaluated less favourably than men when both men and women are highly 

qualified or perform well (Bradley, 1984). 

Generally subjects offered at universities are classified as either male 

orientated or female orientated, and the level of qualification assessed is 

generally considered high. Therefore if women were assessed 

unfavourably at university this would lend support to Nieva and Gutlek's 

( 1980) study. So, in practice do results indicate that sex bias operates in 

universities? 

A study by Bradley (1984) addressed this issue. The study was designed 

to exclude the possibility that it was differences in the abilities of both 

men and women. This is the reason generally given for any differences in 

the distribution of examination marks between the sexes (Dale, 1959; 

Murphy, 1982; Rudd, 1984). Results indicated that markers who were 

familiar with the student being marked were not biased in their marking, 

but markers unfamiliar with a student were biased. In discussion it was 

noted that it may therefore be the case that the risk of sex bias may be 

greater in large departments due to the small amount of staff-student 


25 

contact. When determining the occurrence of sex bias, the sex of the 

examiner is of less importance than the traditionality of the examiner, as 

both men and women examiners are exposed to the same cultural 

stereotypes and expectations of sex-role appropriate behaviours (Bradley, 

1984). Thus both male and female markers can be influenced by the sex 

of the individual being evaluated (Nieva & Gutlek, 1980). 

Determining whether sex bias, or indeed any form of bias, exists is 

possible. However, the detection of bias is not a matter of simple 

observation. There is no support for the opinion that examiners are aware 

of any biases they themselves contribute, nor is there any reason to 

expect that examiners are aware of the biases contributed by their 

colleagues or that they will be able to take steps to make it ineffective 

(Bradley, 1985). 

3.6 Conclusion 

Like all appraisals, performance appraisals in education of postgraduate 

students' acquisition of knowledge and skill are fallible. As this chapter 

has shown, questions have been continually asked about the 

appropriateness of the methods of assessment used and it is well known 

that these methods are not always reliable. Further, their validity is 

complicated by the biases that operate both within education and by 

markers. How severely these inaccuracies of the education process affect 

the final grades awarded to students is the subject of the research 

reported here. 


CHAPTER FOUR 

THE UNIVERSITY SYSTEMS 

4. 1 Universities - Their Purpose 

26 

Universities are among the oldest institutions in Western society. Their 

long history shows how they have developed and changed in response to 

peoples' insatiable desire for knowledge, and society's need for advanced 

thinking and skilled workers. Originally the word "universitas" meant a 

whole body of masters and students in a community, working together to 

seek truth through instruction, debate and research (Gibson, 1978). 

Today, universities are structured very differently, and the purposes of 

universities are more diverse. Consequently academics, are frequently 

drawn into discourse as to what the purposes of universities are, and 

whether present systems are successful in fulfilling these aims. On the 

one hand governments, industry and students urge a "vocationalism" 

upon the universities which finds expression in labour market trends (Birt, 

1985). Highlighted by the demand for courses in Accountancy, 

Technology, and Computer Science. Yet others believe university 

students should also be encouraged to pursue truth, knowledge and 

understanding, to develop intellectual exploration and the free exchange 

of ideas (Ball, 1985). That teaching in universities should focus on old 

ideals and the notion that postgraduate study is a preparation for a life 

of scholarship and admission to an academic community (Blume, 1986). 

No doubt the debate as to universities' actual priorities will continue, but 

at present perhaps it is best to concede, that regardless of their specific 

purpose, a university education is now something that is becoming more 

and more common. The situation in New Zealand is no exception. 


27 

4.2 New Zealand Universities - The Beginnings 

On the thirteenth day of September 1870 the Act of the General Assembly 

was passed signalling the beginning of university education in New 

Zealand (Parton, 1979) . Since that day, over one hundred and twenty 

years ago New Zealand's university system has undergone many changes 

in the structure, operation, and funding of universities. Changes in the 

composition of the students, the courses they pursue, and the way they 

are assessed are also evident. 

The first university in New Zealand was established in Dunedin, a year 

prior to the Act of the general assembly by an ordinance of the Provisional 

Council of Otago. However, this university, latter to become known as 

Otago University did not have the authority to grant degrees (Bell, 1981 ). 

As a result of the 1870 Act the University of New Zealand was founded, 

and granted the right to confer degrees. In an effort to ensure the 

acceptance and international recognition of the degrees awarded, an early 

decision of the University of New Zealand was that examinations should 

be set and marked by eminent academics in the United Kingdom 

(Yearbook, 1990) . Once established, this policy proved hard to alter and 

continued to have a significant impact on university teaching, restricting 

initiative and change for many years. Finally in 1939 it was agreed that 

professors in New Zealand should be the examiners for a stage three 

subject. The commencement of World War two, and the possible risk of 

examination papers being lost or delayed on their way to the United 

Kingdom, then ensured that this reform continued and an increasing 

number of New Zealand examiners were appointed. 

In 1961, the federal University of New Zealand was abolished and the 

universities in operation at that time, Auckland, Victoria, Canterbury, and 

Otago, become autonomous entities. A link between the universities and 


28 

government was established by the introduction of the Universities Grant 

Committee. Latter in 1963 separate Acts of Parliament established the 

universities of Waikato and Massey. Since then university education 

within New Zealand has expanded to include seven universities. The last 

of which Lincoln University (previously known as Lincoln College and 

associated with Canterbury University), obtained University status in 

1990 (Yearbook, 1990) . On July 1st 1990, the Universities Grant 

Committee was abolished under the provisions of the Education 

Amendment Act 1 989 (Hall, 1 990). 

4 .3 The Present New Zealand University System 

All the universities in New Zealand are divided into faculties and 

departments except for the University of Waikato, which is divided into 

schools. Students may undertake a course of study either on a full-time 

or part-time basis. Additionally, Massey University offers many courses 

of study through distance education. 

Prior to 1988, to be eligible for entry into any university course of study, 

an applicant was required to have successfully passed the University 

Entrance exam, in at least four subjects including English. Since then, 

entry to university has been determined with reference to a students Sixth 

Form Certificate points classification. It is required that 12 or less points 

are accrued over fours subjects. However, most students complete a 

seventh form year, and entry is determined on Bursary examination 

results . Provisional entrance may also be granted to students over the age 

of twenty one years who do not have the minimum qualification. 

Some courses however have restricted entry due to there being more 

candidates than places. Preference is usually given to students with the 

best examination results in specified subjects after their seventh form 


29 

year of study at secondary school, or at the end of their first year of 

university study (intermediate year). 

The main, and usually first, stage of university education in New Zealand 

universities leads to the Bachelors degree. The length of this course of 

study differs from faculty to faculty, but typically a Bachelor degree 

requires three years of study for Arts, Science, Horticulture, Agriculture 

and Commerce; four years for Engineering, Horticultural and Agricultural 

Science; five years for Architecture, Veterinary, Dentistry and Law; and 

six for Medicine. A Bachelor with honours degree usually requires an 

additional year of study . 

The second stage of university education leads to the Masters degree. 

This is usually obtained in one to three years, and can be awarded with 

honours or distinction. The Masters program usually entails course work, 

a thesis, or most commonly, a combination of the two. Typically the third 

stage of university education is a Doctor of Philosophy, obtained after a 

minimum of two years supervised research and a presentation of a thesis. 

Doctorates of Literature, Science, and Law are the most advanced degrees 

of the university system and they are awarded for exceptional advanced 

research, or as honourary degrees to those in the community deemed to 

deserve them by the universities. 

4.4 The University System of England and Wales 

The university system, with which New Zealand is most frequently 

compared is that of Britain. This is generally because up to date British 

statistics are available, New Zealand universities are staffed by some 

academics with first hand experience of British universities (New Zealand 

University Conference, 1969), and because with few exceptions, the 

British system of education more closely resembles our own than any 

other (Pool, 1 987). Therefore the present study will research whether 


30 

there are similarities in the distribution of grades for honours students 

from these two university systems. A considerable amount of research 

has already been conducted in Bri tain regarding the equivalence of grading 

standards between the sexes, institutions, and faculties of British 

universities. These studies will be discussed in Chapter five. 

A comparison of this nature lacks validity unless it takes into account the 

different circumstances operating within the university systems of the 

countries compared. Entry to university in both England and Wales 

normally takes place after a minimum of 13 years of primary and 

secondary education. To be eligible, an individual requires a certain 

number of passes in the General Certificate of Education examinations 

(GCE) at both the 11 0" level and "A 11 level. The first degree of higher 

education, the Bachelor degree, is usually awarded after three years of 

study, but this varies between faculties and can be as long as six years. 

There are two types of Bachelor degrees. The first is the honours or 

special degree, the second is the ordinary or pass degree generally 

awarded to those candidates who have studied for an honours degree but 

whose results do not justify the award of honours (UNESCO, 1980). For 

the final examinations, universities not only appoint examiners from their 

own teaching staff, but also call in the services of a number of external 

examiners from other universities (Williams, 1979; Piper, 1985). In this 

way whilst preserving the autonomy and character of each individual 

university, the universities also try to maintain an equivalent standard of 

achievement throughout the country. However, comparability of 

standards in England and Wales has not always been maintained by the 

external examiner system. 


31 

4.5 Standards in the British University System 

The problem of addressing fairness, that is , the maintenance of equivalent 

standards both between and within universities has been a reoccurring 

one within the British university system. In Britain , standards were first 

maintained by controlling the institut ions which were empowered to 

award degrees . At the beginning of the nineteenth century Oxford and 

Cambridge were exclusive with regards to social-class and religious 

denomination. Then with the creation of London University and later the 

provincial colleges the notion of standards was more implicit in the 

discussion of institutional hierarchies (Silver & Silver, 1986). Later the 

various roles of London University in particular , addressed the issues of 

standards , their definitions, and guard ianship. 

In 1858 The Charter permitted colleges t o prepare students t o sit the 

London Univers ity external exam (Silver & Silver, 1986). However, the 

external degree soon raised questions about the appropriateness and 

justice of examinations divorced from teaching. While the separation 

looked attractive as a guarantee of objectivity, students faced 

examinations whose standards were based on criteria often unrelated to 

or in conflict with those of the teaching colleges. In the beginning of the 

twentieth century, the external examiner system became crucial to the 

concept of examination standards. 

By the 1960's further development meant that new meanings were being 

sought for the concepts of standards, quality and excellence. Concepts 

which had once seemed absolute and measurable. Christopherson (1967) 

suggested that the maintenance of standards meant ensuring that 

students on completion of their course had some familiarity with the 

basic ideas in a particular field of study, some experience of living and 

working with people of similar ability in other fields of study, and were 


32 

at least equal to others who had done the same course in earlier years. 

However, this was becoming more difficult to achieve as higher education 

continued to expand. 

In addition, the new generation of lecturers in the free speech society of 

the 1960's were instrumental in introducing a number of significant 

changes. Examinations were now being set by those who taught the 

candidate, with the external examiners continuing to play a supervisory 

role to ensure that standards were met. This ensured that papers 

reflected the material covered in the course (though not necessarily the 

syllabus) and thus removed some chance effects. However, it increased 

the chance of poor quality questions, and reduced the level of consistency 

of standards between colleges and between years of any course (Gaskell, 

1979). 

As previously discussed, the reliability of examinations and marks had 

already been challenged, most notably in Britain by Hartog and Rhodes 

(1935). Various techniques proposed to improve marking reliability did 

not silence anxieties about differences amongst subjects, and within 

subjects (Cox, 1967). Dale in 1959 castigated staff for their ignorance 

of the pitfalls of examining, and their belief that they carried in their heads 

an absolute standard of 40 percent. He pointed to the wide disparities 

of first class honours awards in different subjects, ranging from 1 /4 in 

Applied Science to 1 /70 in Arts (Dale, 1959). What was being discovered 

was "the complexity of the assessment task" (Miller & Parlett, 1973). By 

the 1980's the same reservations were appearing with regards to the role 

of the external examiner, whose presence did not appear to guard against 

arbitrary differences, and whose experience of "comparability" was 

questionable (Silver & Silver, 1986). Yet the external examination system 

is still held up as one of the major guarantees of quality and equity within 


33 

British higher education (Williams, 1979; Piper, 1985; Connolly & Smith, 

1986; Johnson, 1988) . 

4.6 The Brit ish External Examination System 

Very little has been published on the role of external examiners, and until 

recently no systematic survey of their work has been undertaken. 

Unfortunately, the results are not encouraging, the external examination 

system is not very effective in guaranteeing equivalence of standards 

between universities. 

Piper (1985) asked external examiners to outline their role in this 

capacity. The most commonly reported role was that of being an 

additional marker for borderline candidates (86%). Being an additional 

scrutineer for exceptionally good or exceptionally poor work was reported 

in 70% of the cases. Similar figures were found for arbitration when 

internal examiners failed to agree on a mark. The role of ombudsman was 

not common, 10%, as it was thought that other resources were open to 

students who fe lt they had been unfairly treated. 

In comparison, institutions saw their external examiners as having the 

function of checking standards . They did not perceive the 

recommendations of their external examiners as moving them towards the 

centre, rather most institutions saw their external examiners as either 

sanctioning the present state of affairs, or else encouraging them to 

award more top grades (Smith, 1990) . 

Williams (19 79) states that the purpose of the external examiner " is 

generally understood to be the maintenance of similar standards between 

differen t universities" (p. 162). Yet the question of standards is not 

st raight f orward . There are at least four forms of consistency or equality 

w hich need distinguishing: 


34 

1. The maintenance of standards from year to year in a give course . 

2. The monitoring of equivalence between course options. 

3 . The parity of standards between universities within subjects. 

4. Parity between different subjects for nationally recognised levels of 
accreditation. 

It is apparent that a clear understanding of the role of external examiners 

is neither manifest or practised . Institutions' reliance on external 

examiners to ensure fairness and comparability seems naive, when the 

external examiners themselves fail to see this as one of their major tasks 

in the external examining system. This would indicate that the system 

needs to be rethought and objectives need to be defined more carefully. 

Failing that perhaps other means of addressing equity may need to be 

considered. 

In England and Wales comparability is too serious an issue to be dismissed 

by complacent references to the external examining system . In New 

Zealand, although there is no "appointed" national body to ensure 

equivalent standards, comparability is equally as important. Degree class 

has too great an impact on the future lives of students for scant attention 

to be paid to this matter (K linov-Malul, 1974; Johnson, 1988; Dolton & 

Makepeace, 1990). 

It has been argued that unless standards can be maintained, the ability to 

compare students, courses, and institutions becomes highly questionable. 

It can be equally proposed however, that it is only through comparative 

stud ies of this nature, that questions concerning the va l idity and reliab ility 

of the existing standards can be made. Chapter five discusses several 

studies that have addressed these questions. 


5.1 Introduction 

CHAPTER FIVE 

HONOURS STUDIES 

35 

Degrees with distinction provide their holder with opportunities for further 

advancement within higher education and in the labour market generally. 

Therefore any other factors apart from ability and knowledge that might 

improve opportunities to obtain a top degree are extremely important. 

Several studies have addressed the impact of subject studied, institutional 

characteristics, and gender on students degree performance. 

5.2 Gender Studies 

A study by Rudd (1984) sparked a great deal of debate in Britain about 

the pattern of honours degrees awarded. Rudd's research examined 

honours degrees awarded to men and women in British universities during 

1967, 1 978 and 1979. He reported that women gained a lower 

percentage of both first class degrees. and the lower honours degrees 

compared to men. After discounting a number of plausible explanations 

as to why this might be the case, he concluded that "the only explanation 

that seems to fit all the facts is that this difference is linked to differences 

in the distribution of ability as measured by the scores gained in 

intelligence tests" (p. 4 7). Support for this explanation was credited to 

Heim's ( 1970) study which suggests that women's test scores give a 

distribution of measured intelligence which is slightly different to that of 

men, with a smaller percentage at the extreme ends of the scale. 

Rudd (1984) also looked at the differences between the sexes in obtaining 

a "good" degree, that is a first class or upper second honours degree. His 

results showed that women performed better than men in Education, 

Medical subjects, Engineering, Agricultural subjects, Social studies, 


36 

Architecture and other Professional studies groups. Men performed 

better in Arts and Language subjects, even though these are two areas in 

which men are under-represented. 

It is perhaps not surprising that Rudd's ( 1984) research was controversial, 

but this was not due to his results, which generally have been supported 

by other British studies (Jones & Castle, 1986; Kornbrot, 1987; Clarke, 

1988), but rather because of his explanation for the results obtained. 

In 1988 Simon Clarke reevaluated Rudd's ( 1984) study and suggested 

that Rudd overestimated the tendency for men to achieve a 

disproportionate number of first and third class honours degrees, and that 

he failed to pay sufficient attention to the marked differences in 

performance as a function of the subject studied or to the change in 

relative performance over time. Clarke ( 1988) found that in general, 

women did better in Professional subjects and Chemical and Biological 

Sciences. Men did better in the Arts, Mathematics and the Physical 

Sciences, and the sexes performed at the same level in Social Sciences. 

Women still underachieved at the first class -level, and men still tended 

to get more third class degrees, but Clarke ( 1988) suggested that these 

factors were often linked to the area of study. 

Acknowledging the differences between the sexes that Rudd (1984) had 

reported Clarke ( 1988) also questioned why males and females were 

disproportionately represented with respect to classes of degrees. He 

rejected Rudd's (1984) explanation on the grounds that IQ tests are not 

a valid measure when assessing differences between the sexes. A crucial 

aspect in the design of intelligence tests, is that the test not be biased 

in favour of either sex. As IQ tests have developed, items that have 

shown consistent differences between the sexes have been excluded. 


37 

Due to this fact all attempts to show sex differences in ability by use of 

intelligence tests are invalid (Ryan, 1972). 

After reviewing the evidence Clarke ( 1 988) proposed that the differences 

in the overall performance of men and women were the result of social 

and institutional pressures. He pointed to sex stereotyping, and biases 

in examining, supported by Bradley's (1985) research, as part of the 

explanation of men obtaining a disproportionate number of first and third 

class honours degrees. Clarke ( 1988) also suggested that there is a need 

to look at differences in the cultural and institutional framework that may 

exist which discriminate differently between men and women in different 

subject areas. In conclusion he stated that there have been positive 

changes over time, evident by the improvement in performance by women 

in all subjects except Arts, relative to that of men. 

One of the major advances within this area that Clarke's ( 1988) study 

appears to advocate, and which Rudd ( 1984) failed to acknowledge, is 

that differences between the sexes cannot be considered in isolation. An 

important factor is an individual's area of study. This has been supported 

by other researchers, for example Kornbrot ( 1987) who concluded from 

her study that gender differences in degree performance tend to depend 

on content area and topic. 

Kornbrot's ( 1987) study found women significantly more likely than men 

to achieve a competent degree of lower second or better in all disciplines, 

but like other studies men obtained more first class degrees. In particular 

men were substantially more likely than women to achieve first class 

degrees in the Humanities, Social Sciences, and Language and Literature 

areas. Women were more likely than men to achieve first class honours 

degrees in Medicine. The overall pattern suggested that women were 

highly successful in many disciplines which were strongly stereotyped as 


38 

male, and where they were currently under-represented. This raises the 

interesting question of whether a person's assessment is affected by their 

choice of study. 

5.3 Subject Studies 

Regardless of whether a student is majoring in Physics, Accountancy, or 

French, a first class honours degree should require the same amount of 

ability, sagacity and effort on the student's behalf. Several studies have 

investigated this phenomenon and found discrepancies in the grades 

awarded between different subjects. Ascertaining the reasons for these 

differences however, has not been entirely successful. 

Neuman and Ziderman ( 1985) investigated the existence of differences in 

standards in awarding first degrees with distinction amongst universities 

in Israel. Considerable diversity was found in the tendency to award first 

degrees with distinction between and within universities and faculties, 

and between the major subject departments of the Social Sciences, which 

was selected for more detailed analysis. The results of their research 

may be particularly pertinent to New Zealand research as Israel's 

university systems parallels New Zealand's university system in several 

ways. Israel is a small country with six independent universities, all 

operating within the framework of a central Universities Grant Committee, 

modelled on the British pattern. (Note, the Universities Grants Committee 

ceased to exists in New Zealand from 1st July 1990 (Hall, 1990), however 

the research conducted in the present study extends only to 1989). 

An analysis of variance of first degrees awarded with distinction, by 

university and faculty in Israeli universities between 1979 to 1983 

revealed that both the main effects and the interaction effect were highly 

significant. Neuman and Ziderman (1985) reported that Natural Science 

faculties tended to award more degrees with distinction than average 


39 

(coefficient of + 0.21 ), whereas Social Science awarded less (-0.19), and 

Arts faculties were on a par with the overall average tendency to grant 

degrees with distinction. In conclusion Neuman and Ziderman (1985) 

stated that "there is a pressing need for universities in Israel, as in England 

(and possible in other countries too), to set their houses in order through 

the framing of procedures for the maintenance of common standards in 

the granting of degrees with distinction, both between as well as within 

universities" (p. 458-459). 

The need to develop a method to ensure equivalence in standards is 

echoed by others. No less so than by Bourner and Bourner (1985) whose 

research explored the pattern of honours degree results in Accounting 

with those of other subject areas. Their results based on British data were 

in agreement with Neuman and Ziderman (1985). Individuals in the 

Science and Engineering/Technology subject groups were awarded, on 

average, a higher proportion of first class degrees than any other subject 

group. Specifically, the proportion of first class degrees awarded by both 

of these groups exceeded that of Accounting by a factor of seven. Note 

that Accounting was placed in the subject group Social, Administrative 

and Business studies, which in total received the smallest number of first 

class degrees. 

An older, yet frequently quoted piece of research that addresses the 

variation from department to department and from year to year in the 

standard of degree classes is that of Dale (1959). Dale's (1959) results 

also showed a greater proportion of first class honours degrees awarded 

to Science students compared to both Commerce and Arts students. · It 

would be naive to expect that class percentages for different faculties 

should be equal, however most researchers would be even more 

astonished if these results were a true reflection of the comparative ability 

of students from different faculties. Doubtless, individuals of different 


40 

major fields do differ in several ways. It has been shown that they differ 

in their personality traits (Elton & Rose, 1967), and in their scholastic 

strategies (Goldman & Warren, 1973). Nevertheless, Dale's (1959) study 

found no evidence from psychological testing of students from different 

faculties or departments in their ability that corresponded with the 

differences in degree awards obtained. Other studies have also failed to 

support the idea that variation in grades is due to the ability composition 

of students studying different subjects (Nevin, 1972; Rudd, 1984; Clarke, 

1988). 

Although several studies have obtained similar results in the awarding of 

first class honours degrees between faculties, little discussion has been 

offered as to why this might be the case. Yet all researchers are adamant 

that standards should be more equivalent, and that methods to achieve 

this should be developed. Generally Dale's (1959) explanation of these 

results is accepted as addressing some of the discrepancies. Dale (1959) 

reasoned that the wide variation in degree standards from one faculty to 

another lay in the nature of the subject matter. Those subjects in which 

the mathematical content is high yield a much greater spread of marks 

than subjects such as English and History in which the essay type of 

answer predominates. Therefore Mathematics will award more firsts than 

English. Using this argument Mathematics should also award more thirds. 

5 .4 The Student Population 

Throughout the world it is evident that employment in certain sectors of 

society are decreasing while other sectors are increasing. Specifically 

the number of people employed in the agricultural and industrial sectors 

are declining, while there is an increased need for individuals in the 

commercial and service industries (Yearbook, 1990). Therefore it is not 

surprising that these trends are reflected in the enrolment figures of 


41 

students in university courses (Blume, 1986; Fenner, 1989). However, 

other trends also exist, so that the changing structure of the student 

population, with regards to their choice of subject, is not a simple linear 

equation, between area of greatest employment and increased numbers 

enroled in the appropriate faculty area. 

The population of students in New Zealand universities reflects that of 

most overseas universities in that there has been an annual increase in 

the total number of students attending university, and that there has also 

been an increase in the number of students furthering their education by 

undertaking postgraduate education (Sub-Committee on Graduate 

Employment, 1988). The proportion of females attending university 

comes closer to approaching fifty percent of the total student population 

each year (Pool, 1987). 

American studies show that the general pattern of change for women 

students, is that they have increased their presence across the board in 

all fields of study (Roemer, 1983). Women have made decisive 

movements into fields in which they have previously not been well 

represented. At the same time women have accepted the basic patterns 

that were established in the 1970's, and continued to pursue studies in 

areas that are regarded as appropriate choices for women. In Britain the 

representation of women in Sciences and Engineering at all levels has 

shown a steady increase over the past two decades, although most 

women are still at the low levels, both. in terms of academic achievement 

and employment (Ferry, 1982). In the United States the same situation 

prevails (Fenner, 1989). 

Women's participation at the postgraduate level of education is still 

substantially less than that of men's in the United States (Roemer, 1983), 

Britain (Jones & Castle, 1986), and New Zealand (Taiaroa, 1985). The 


42 

reasons for this are complex, but a major contributing factor is that 

postgraduate degree enrolments are largely determined by the quality of 

the first degree and men still achieve more and better honours degrees 

than women (Jones & Castle, 1986). A further result of this is that men, 

due to their better grades, are more likely to be recipients of scholarships, 

and therefore to have greater access to postgraduate education (Jones & 

Castle, 1986). 

A further restriction on the entry of females to postgraduate studies is 

their predominance in the traditionally "acceptable" areas of study for 

women, that is the Arts and Education. For this reason, there are large 

numbers of women competing against one another for the limited number 

of positions, scholarships and grants available (Jones & Castle, 1986). 

So a closer look at higher education reveals that females have made 

definite inroads in relation to their participation in universities, however 

the rule of "the higher the fewer" still applies to women in almost every 

field of study (Ferry, 1982). 

This fact is further emphasised by a glance at the composition of 

university staff. In Britain, of the full time teaching staff in universities, 

only 10% are women. At the higher positions of readers, senior lecturers, 

and professors, 40% of men hold these positions compared to 18% of 

women, and they are usually represented in the faculties of Arts and 

Social Sciences (Ferry, 1982). Similarly, in Australian universities only 

17% of the senior academic staff are women (Buckridge & Barham, 1984). 

5.5 Institutional Differences 

In Britain several recent studies have considered whether graduates of 

one institution are comparable with graduates of other institutions. The 

studies of Bee and Dolton, (1985); Connolly and Smith, (1986); Johnes 

and Taylor, (1987); and Smith, (1990) have all shown that there is a 


43 

significant, and frequently large, variation in the degree classes awarded 

to students as a function of the university they attended. Several 

explanations for this variation have been presented. 

The most comprehensive of these studies was conducted by Johnes and 

Taylor ( 1987). Three significant relationships between the variation in 

degree results of universities and several student and institutional 

characteristics researched were found. They were A-level scores, 

proportion of students living at home, and library expenditure as a 

percentage of total spending. 

The mean A-level scores of a university's students was quite significantly 

related to degree results. A one point increase in A-level scores was 

associated with an increase of between three to four percentage points 

in the proportion of graduates with a first or upper second class honours 

degree. This finding differs from previous research (Wilson, 1981; Sear, 

1983; Foy & Waller, 1987) which has found only a weak relationship 

between A-level scores and the prediction of class of degree. Further, 

the studies of Connolly and Smith (1986) and Smith (1990), which 

investigated the variation in degree classes in Psychology, also found that 

A-level scores were not able to predict class of degree. 

Universities with a high proportion of students living at home during the 

terms, were more likely to produce poor results than universities in which 

the proportion of students living at home was low. As John es and Taylor 

(1987) pointed out, the interpretation of this result is unclear since it can 

not be determined whether the proportion of students living at home 

describes the type of students a university acquires or whether it is 

indicative of characteristics which relate to the universities themselves. 

Similarly it is difficult to interpret why large expenditure spent on the 


44 

library was positively related with universities that awarded a higher than 

average number of good degree results. 

All studies concerned with variations in universities have attempted to 

measure whether these differences are a function of the quality of 

different universities, in particular the quality of teaching. However, it 

has been difficult to obtain a true measure of this factor. Connolly and 

Smith ( 198 6) considered the accessible statistics of staff-student ratio 

as a crude operationalised measure of quality of teaching. This measure 

has since been used by other researchers. Connolly and Smith's (1986) 

results were significant but small, r = 0.40 and 0.11. All other studies 

(Bee & Dolton, 1985; Johnes & Taylor, 1987; Smith, 1990) found a 

non-significant relationship between staff-student ratio and the variation 

between universities in the distribution of degree classes. The other 

notable finding observed in all these studies is that, the variation in degree 

awards across universities was consistent over time. 

Several researchers (Bee & Dolton, 1985; Johnes & Taylor, 1987) have 

suggested that from the point of view of the student seeking a good 

degree result it matters little whether differences in awards across 

universities arise through genuine differences in "value added 11 by the 

institution or simply through arbitrary institutional perceptions. What 

does matter is that the differences do exist, that they can be large, and 

that the pattern is consistent over time. 

The same can also be said about the impact of a student's, sex or choice 

of subject studied on their resulting degree award. Bee and Dolton ( 1985) 

further suggest that "for all concerned a reappraisal of the award system 

is both necessary and long overdue 11 (p. 49). It is possible that an 

appraisal of the New Zealand university award system might also be 

warranted. 


45 

5. 6 The Present Study - Part A 

The present study is firstly concerned with whether the grading practices 

employed at the postgraduate levels of Bachelor with honours and 

Masters degrees, in New Zealand universities are appropriate and fair. 

Secondly, whether they are comparable to the universities of England and 

Wales. The appropriateness of the way in which students are graded is 

addressed from a theoretical discussion of the past research in this area 

of interest. The fairness of New Zealand's grading system is researched 

by statistical analysis of the results awarded to New Zealand 

postgraduate students over the past thirty years in relation to several 

other factors, such as gender, course taken, subject studied, and 

university attended. Whether the systems are comparable is considered 

with reference to a comparison of the New Zealand results and English 

and Welsh results generated in the present study, and the results of 

several previous British studies. 

The present study is not a replicat ion of any previous research. There are 

however, similarities between the present study and several other recent 

studies. The following studies, unless otherwise stated, all involve 

research using British subjects and/or statistics. The studies of Rudd 

(1984), Kornbrot (1987), and Clarke (1988), compared degree 

performance as a function of gender and discipline studied. Research 

completed by Bee and Dolton, (1985), and Johnes and Taylor, (1987) 

sought to explain the variation in class of honours as a function of several 

student and institutional characteristics. Bourner and Bourner ( 1985) and 

Smith (1990) looked at the equality of standards within specific 

departments, namely, Accounting and Psychology respectively. Neuman 

and Ziderman ( 1985) considered whether universities maintained common 

· standards in awarding first degrees with distinction in Israel. 


46 

None of these studies have incorporated data that spans three decades, 

or have covered the population of postgraduate students as 

comprehensively as the present study. The present study is the first to 

address the population of New Zealand Bachelor with honours and 

Masters postgraduate students, as a function of grades received and 

several other variables discussed below. 

The present study is exploratory. The first part of the study, Part A, is 

only concerned with New Zealand students. The variables explored in this 

part of the research are as follows: 

1. Sex of student 

2. Course student studied 

3. Major studied 

4. Year completed degree 

5. University attended 

6. Class of honours received 

The major objective of the present study is an analysis of the relationships 

between the class of honours a student receives and the five other 

variables. The null hypothesis for this research is that the differences in 

class of honours awarded is in no way a function of differences between 

the sexes, between universities, between degrees, between fields of 

study, or across time. 

A secondary consideration is any significant relationships between the 

independent variables. For example, the sample data is measured over the 

years 1960 to 1989 inclusive. Have their been changes in the proportional 

representation of male and female students over this time? Are the 

subjects that were most popular in 1960 the same as those in 1989? 


47 

The number of students going on to further education in New Zealand has 

grown in the last two decades, however, in comparison to other similar 

countries the proportion of students continuing their education is low 

(Cabinet Committee on Training and Employment, 1987). For example, in 

1984 only 24% of 18 to 23 year olds in New Zealand were in some form 

of part or full time education compared with 49% of the same aged 

students from North America in 1985, 28% for East Asia, 27% for Latin 

America, and 32 % for Europe and the United Kingdom (Population 

Monitoring Group, 1986). 

Students who chose to attend university in New Zealand are not 

representative of New Zealand's general population. Social and ethnic 

origins have a significant effect on the likelihood of a student entering 

university. For example Maori and Pacific Island students are 

under-represented at the University of Auckland by a factor of four 

(Jones, 1982). 

Women fare better. They now represent close to 50% of the intake of 

undergraduate students, which displays a degree of equivalence between 

the sexes, unparalleled by most other Western Countries. However, 

women are disproportionately represented among the part-time students, 

mature students, and those studying extramurally (Pool, 1987). 

Given the unfavourable situation of university education in New Zealand 

compared to several other countries, comparative research may highlight 

specific problem areas in the New Zealand university system. The analysis 

of performance in New Zealand universities is given added meaning by 

comparing it with the performance of other countries, this is the intention 

of Part B of the present study. 


48 

5. 7 The Present Study - Part B 

The most parsimonious comparison of New Zealand grades with British 

grades seems best. For this reason the New Zealand university grading 

system will only be compared with the universities of England and Wales. 

Scotland, Northern Ireland and Eire, have been omitted because they have 

different entrance requirements, and different degree and grading 

structures (UNESCO, 1980; Smith, 1990). Further, previous studies have 

stated that the differences between the structures of the British university 

systems have only served to complicate the analysis of results with 

regards to their investigations of degree performance (for example, Bee 

& Dolton, 1985; Johnes & Taylor, 1987; Clarke, 1988; Smith, 1990) . 

In England and Wales, data similar to the variables being considered in 

Part A of the present research, are collected and collated yearly, and 

presented as the Universities Statistical Record . This information for the 

years 1974 to 1989 will be used in Part B of the present study. First a 

separate analysis will be done to ensure that results of the present study 

concur with those of past studies that have used this same information. 

Then these results from England and Wales will be compared with the 

New Zealand results previously obtained in Part A of the study. The 

comparisons of results will examine the following variables: 

1 . Sex of student 

2 . Major studied 

3. Class of honours received 

The objective of Part B of the present study is to determine whether the 

distribution of grades received by Bachelor with honours and Masters 

students in New Zealand universities differs to the distribution of grades 

Bachelor with honours students in England and Wales universities receive. 


49 

The hypotheses of the present study are listed below. The first two 

hypotheses apply to the results researched in both Part A and B of the 

study. The next three hypotheses only address the results of Part A of 

the present study, the New Zealand results. The last two hypotheses refer 

to the comparison of results from Part A and Part B of the present study. 

HYPOTHESES 

1 . That male and female students do not receive equivalent proportions of each 

class of honours. 

2. That the grade distribution between areas of study is not equal. 

3. That in New Zealand the distribution of grades is different for Bachelor with 

honours and Masters qualifications. 

4. That in New Zealand between the years 1960 to 1989 males and females 

representation in areas of study has changed. 

5. That the proportional distribution of honours grades awarded differs between 

New Zealand universities. 

6. That the areas of study chosen by students in New Zealand and England and 

Wales universities are dissimilar. 

7. That the distribution of grades awarded at New Zealand universities differs to 

the distribution of grades awarded in England and Wales universities. 


6.1 Subjects 

CHAPTER SIX 

METHOD - PART A 

50 

The samp le consisted of all ind ividuals who had completed a Bachelor 

with honours or Masters degree at any university in New Zealand between 

the years 1960 to 1989. This complete population of students was 

chosen above any sampling procedures for several reasons . Firstly , 

because the statistical analyses used were sensitive to low or zero cell 

counts (Upton, 1 978) . This would have eventuated if a sampling 

procedure had been used. Secondly , as research into this field has never 

been conducted in New Zealand , it was decided to address global issues 

before proceeding to more specific areas of investigation. For this purpose 

an extensive sample is therefore advantageous. Finally, this exploratory 

research may assist in highlighting where further research may be 

warranted, unaffected by the problem of inaccurate sampling procedures . 

There was a total of 34413 students, of which 21914 were male, 9601 

were female and 2898 were of unknown gender. Gender was unable to 

be classif ied in some cases as students had first names that were 

appropriate for either males or females, or they had foreign names which 

were unable to be correctly determined. After inspection of individual 

cases the sample was reduced to 31072 students. This represents the 

total number of subjects for which there was complete and useful 

information for all variables. Students whose gender was unable to be 

interpreted, and/or students who had graduated from The University of 

New Zealand, and/or students with no area of study provided or who had 

completed a double major were excluded. The sample of 31072 students 

consisted of 21364 (69.4%) males and 9508 (30.6%) females . 


51 

6.2 Procedure 

The information required about each student was extracted from the 

University Graduation Ceremonies booklet of New Zealand's seven 

universities: the University of Auckland, University of Waikato, Massey 

University, Victoria University of Wellington, University of Canterbury, 

Lincoln College (now Lincoln University), and the University of Otago. As 

well, the monthly council meetings of Victoria University since 1980 were 

used as this university has not included graduates "in absentia" in their 

graduation ceremony handbook since that time. 

The computer program Massey University Database (Massey University 

Computer Centre, 1988) was used to record the necessary information 

for each student in the sample. Information recorded was the student's 

name, their course of study (COURSE) and their major taken (SUBJECT), 

and in coded form their gender (SEX) , the university they attended 

(VARSITY) and in what year (YEAR), also the class of honours they 

received for the course undertaken was recorded (GRADE). The data was 

then double checked and corrected for discrepancies. 

The computer program Word Perfect 5.1 (WordPerfect Corporation, 1989) 

was used to combine all information into one file, and to code the 

information on course of study and major taken. The codings used for the 

variables are listed in Appendix 1 . 

6.3 New Zealand Analyses 

The statistical packages SPSS-PC version 3.1 (SPSS Inc., 1988) and 

SPSSX version 10 (SPSS Inc., 1986), were used to analyze the data. As 

the majority of the variables were measured on a nominal scale, analyses 

were restricted to frequencies, crosstabulations, and chi-square test 

statistics. The analysis of the data was performed in several steps, in 


52 

answer to the questions that were being addressed and dependent on the 

results obtained from previous analyses ; 

6.3. 1 Step One - Univariate analysis 

Univariate information, in the from of frequencies of the variables, were 

obtained to determine the characteristics of the population that the 

present research addressed. This was a necessary consideration as the 

population was not represented in New Zealand annual statistics . 

6.3.2 Step Two - Crosstabulation of degree and gender 

Due to the exploratory nature of the present study, the focus of interest 

was on global rather than specific differences in the population . For this 

reason, several of the original variables were reduced to a smaller number 

of categories. The variable COURSE was collapsed into two categories. 

They were Bachelor with honours or Masters qualifications. This new 

variable was labelled DEGREE. The variables DEGREE and SEX were 

crosstabulated to determine whether males and females undertook both 

Bachelor with honours and Masters qualifications in the same proportions. 

6.3.3 Step Three - Changes in the sample over time 

Similarly the variable YEAR was collapsed into six separate levels by 

grouping each consecutive five years together into one value . This 

variable was called TIME. Previous research overseas has found that the 

gender composition of persons who attend university now differs from 

those who attended university in the past (Roemer, 1983; Clarke, 1988). 

Therefore, step three sought to determine if there had been any changes 

in the representation of both males and females at New Zealand