Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere without the permission of the Author. Mas$ev lJri'.versity Library New Zealand & Pacific Collection AN EXPLORATORY STUDY OF FINAL GRADES AWARDED TO BACHELOR WITH HONOURS AND MASTERS STUDENTS A thesis presented in partial fulfilment of the requirements for the degree of Master of Arts in Psychology at Massey University. Patricia Bolger 1990 ii ABSTRACT This study explores the final grades awarded to Bachelor with honours and Masters students in New Zealand universities from 1960 to 1989 as a function of students' gender, the university attended, the degree completed, and the subject studied. These grades were also compared with the grades awarded to Bachelor with honours students in England and Wales from 1974 to 1989. Chi-square test statistics were used to measure the significance of these relationships. In New Zealand women were awarded significantly more first class degrees than men. In England and Wales men were awarded significantly more first class degrees than women. Science students were awarded a higher percentage of first class degrees than other students in both New Zealand and England and Wales. In New Zealand Bachelor with honours students were awarded first class degrees more frequently than Masters students. Political and historical developments, the nature of the grading procedures used, and institutional and departmental variance provide partial explanation for some of the results. It is clear that no single factor is responsible for these variations in degree performance, but rather a complex interaction of several factors. It is concluded that in New Zealand and England and Wales, gender, university, the degree undertaken, and the subject studied, all have an effect on the final grade a student is awarded. iii ACKNOWLEDGEMENTS I would like to thank Mike Smith, my supervisor, for his encouragement, assistance, and practical research philosophy. Thanks also to the New Zealand University Students Association for awarding me their Scholarship for Higher Education. Most importantly this strengthened my own belief in the value of this research. Thanks to Robert Loeffen , Ali Maginness, Joss Tennent, Maria Bolger , and especially Andrew Kibblewhite for their continual support, advice, and friendship. Lastly, Mum and Dad, thanks for the genes and the environment, without which I could never have come this far. Also thanks for the unending support and friendship. TABLE OF CONTENTS Page Abstract ii Acknowledgment iii CHAPTER ONE - OVERVIEW 1 CHAPTER TWO - PERFORMANCE APPRAISAL 4 2.1 Introduction 4 2 .2 The Criterion 5 2.3 Assessment Methods 7 2.4 Types of Data 8 2.5 Rating Scales 9 2.6 Rating Error 10 2.7 Rater Train ing 12 2.8 The Process Model 13 2.9 Performance Appraisal within Education 1 5 CHAPTER THREE - ASSESSMENT IN POSTGRADUATE EDUCATION 17 3.1 To Grade or not to Grade 17 3 .2 Assessment Methods 19 3.3 Assessment Reliability 20 3.4 Biases in Assessment 22 3.5 Sex Bias 23 3.6 Conclusion 25 CHAPTER FOUR - THE UNIVERSITY SYSTEMS 26 4. 1 Universities - Their Purpose 4.2 New Zealand Universities - The Beginnings 4.3 The Present New Zealand University System 4.4 The University System of England and Wales 4.5 Standards in the British University System 4.6 The British External Examination System CHAPTER FIVE - HONOURS STUDIES 5. 1 Introduction 26 27 28 29 31 33 35 35 5.2 Gender Studies 35 5.3 Subject Studies 38 5.4 T he Student Population 40 5.5 Institutional Differences 42 5 .6 T he Present Study - Part A 45 5.7 T he Present Study - Part B 48 HYPOTHESES 49 CHAPTER SIX - THE METHOD - PART A 50 6.1 Subjects 50 6.2 Procedure 51 6.3 New Zealand Analyses 51 6.3.1 Step one - Univariate analysis 52 6.3 .2 Step two - Crosstabulation of degree and gender 52 6.3.3 Step three - Changes in the sample over time 52 6.3.4 Step four - Changes in subject areas over time 53 6.3.5 Step f ive - The distribution of grades 54 6.3.6 Step six - The effect of gender and subject on grades 54 6.3. 7 Step seven - The distribution of first class honours 54 6.3.8 Step eight - Institutional difference in grades 55 THE METHOD - PART B 55 6.4 Subjects 55 6 .5 Procedure 56 6.6 England and Wales Ana lyses 56 6.6.1 Step one - Gender differences in choice of subject area 56 6 .6.2 Step two - The distribution of grades 56 6. 6.3 Step three - The effect of gender and subject on grades 56 6.6.4 Step four - A comparison of New Zealand and England and 57 Wales grades 6.6.5 Step five - A comparison of the subject areas studied in 57 New Zealand and England and Wales CHAPTER SEVEN - RESULTS - PART A 58 7 .1 New Zealand Ana lyses 58 7 . 1 .1 Step one - University attended 58 7. 1 .2 Step two - Crosstabulation of degree and gender 59 7. 1 .3 Step three - Changes in the sample over time 59 7. 1 .4 Step four - Changes in subject areas over time 60 7. 1 .5 Step five - The distribution of grades 65 7 .1.6 Step six - The effect of gender and subject on grades 67 7 . 1. 7 Step seven - The distribution of first class honours 71 7 .1.8 Step eight - Instit utional differences in grades 73 TH E RE SULTS - PART B 74 7 .2 England and Wales Analyses 74 7.2.1 Step one - Gender differences in choice of subject 74 7 .2.2 Step two - The distribution of grades 76 7 .2.3 Step t hree - The effect of gender and subject on grades 78 7 .2.4 Step four - A comparison of New Zealand and England and 78 Wales grades 7 .2.5 Step five - A comparison of subject areas studied in 78 New Zealand and England CHAPTER EIGHT - DISCUSSION 79 8.1 Introduction 79 8.2 Characteristics of the Postgraduate Population 79 8 .2.1 Gender and Degree 79 8 .2.2 Changes in Subjects Studied 80 8 .3 Grading Issues 82 8.4 Differences in Gender and Grades 84 8 . 5 Grade Differences in Subjects 91 8 .6 Compar ison Between Grade Dist ribut ions of New Zealand 96 and England and Wales 8 . 7 Difference in Grades between Bachelor with honours 101 and Masters Degrees 8.8 Institutional Differences 104 App en dices One - Variab le Codes T wo - Study Categories References 108 120 132 LIST OF FIGURES Page Figure 2.1: Cognitive Components in Rating . 12 Figure 2.2: The Process Model of Performance Rating . 14 Figure 2.3: An illustration of the similarities between Education 15 and the Workplace in Judgemental Ratings. Figure 7. 1: The distribution across New Zealand 's universities of 58 students who completed a Bachelor with honours or Masters degree during 1960 to 1989. Figure 7 . 2 : The change in distribution of New Zealand students 60 who have completed a Bachelor with honours or Masters degree between 1960 to 1989. Figure 7 .3: The percentage of New Zealand students who have 62 completed a Bachelor with honours or Masters degree in each subject area during the Sixties, Seventies and Eighties. Figure 7 .4: The percentage of New Zealand Male students who 63 have completed a Bachelor with honours or Masters degree in each subject area during the Sixties, Seventies and Eighties . Figure 7. 5: The percentage of New Zealand Females students 64 who have completed a Bachelor with honours or Masters degree in each subject area during the Sixties, Seventies and Eighties. Figure 7 .6: The proportion of each class of honours awarded to 66 Bachelor with honours and Masters students of New Zealand. Figure 7. 7 : The proportion of each class of honours awarded to 66 Masters students of New Zealand. Figure 7 .8: The proportion of each class of honours awarded to 67 Bachelor with honours students of New Zealand. Figure 7 .9: The percentage of Bachelor with honours and Masters 73 students who were awarded a first class honours degree at each New Zealand university . Figure 7 .10: The distribution of students who completed a 75 Bachelor with honours degree in England or Wales in each subject area between 1974 to 1989 by Gender and Total. Figure 7 .11: The proportion of each class of honours awarded to 76 Bachelor with honours students at England or Wales universities. LIST OF TABLES Page Table 7 .1: The Gender and Degree composition of the sample. 59 Table 7 .2: The proportion of New Zealand students who studied 68 each subject area as a function of Gender and class of honours received. Table 7 .3: The proportion of New Zealand Masters students who 69 studied each subject area as a function of Gender and class of honours received. Table 7 .4: The proportion of New Zealand Bachelor with honours 70 students who studied each subject area as a funct ion of Gender and class of honours received. Table 7 .5: Chi-square results of the proportion of first class 72 honours degrees awarded to Males and Females who completed a Masters in each subject area. Table 7 .6: Chi-square results of the proportion of first class 72 honours degrees awarded to Males and Females who completed a Bachelor with honours in each subject area. Table 7. 7: The proportion of England and Wales students who 77 studied each subject area as a function of Gender and class of honours received. CHAPTER ONE OVERVIEW 1 The degree class awarded to a student is an important marker of achievement. Yet the reliability of assessment in higher education has been the subject of concern for some years (Hartog & Rhodes, 1935; Dale, 1959; Cox, 1967; Foster, 1985; Johnson, 1988). Research continues to highlight discrepancies in the grades that students receive that are not the result of differences in students academic ability. Differences have been noted in the awarding of honours degrees between institutions (Bee & Dolton, 1985; Connolly & Smith, 1986; Johnes & Taylor, 1987), between courses of study (Bourner & Bourner, 1985; Smith, 1990), and between males and females (Rudd, 1984; Kornbrot, 1987; Clarke, 1988). Further there is still no uniform opinion as to why these differences occur. Answers to these questions are likely to be of interest not only to the universities themselves, but also to potential university students and to employers. Potential students are likely to be interested in discovering the extent to which their chances of obtaining a "good" degree might vary between institutions and departments. Employers may be interested to know where they are most likely to recruit graduates with "good" degrees . It is the purpose of this research to investigate whether degree results vary between institutions, the subject studied, and between males and females who have completed postgraduate degrees in New Zealand in the last thirty years. New Zealand grades will also be compared with those of England and Wales. Grading is a form of performance appraisal, and as such a great deal of the research in this area is applicable to grading and assessment within education. Chapter two is an overview of performance appraisal. Nearly iii ACKNOWLEDGEMENTS I would like to thank Mike Smith, my supervisor, for his encouragement, assistance, and practical research philosophy. Thanks also to the New Zealand University Students Association for awarding me their Scholarship for Higher Education. Most importantly this strengthened my own belief in the value of this research. Thanks to Robert Loeffen , Ali Maginness, Joss Tennent, Maria Bolger , and especially Andrew Kibblewhite for their continual support, advice, and friendship. Lastly, Mum and Dad, thanks for the genes and the environment, without which I could never have come this far. Also thanks for the unending support and friendship. 3 In chapter eight the results are interpreted, and some explanations for the outcomes observed are provided. Contrasts and similarities between the results of New Zealand's universities and those of England and Wales are examined. The implications of these results for postgraduate students from both New Zealand and England and Wales are discussed, along with suggestions for future research. 2. 1 Introduction CHAPTER TWO PERFORMANCE APPRAISAL 4 There is no escape from pe rformance appraisal. It is impossib le to go through life without being assessed many times, in many different situations, for many different purposes . Assessments are sometimes forma l as in job interviews or teachers' reports, though they are just as often informal, such as meeting new acquaintances and judgements made by school peers. We are all assessed virtually from birth, and then continually throughout our life, be it by doctors, school teachers, and family, and later, by the bank manager , lecturers , and the sports coach. We assess people who provide us with services such as sol icitors, che fs, and hairdressers and act on our judgement of their effectiveness to decide whether we will continue to use their services. To appraise anything is to set a value on it. The purpose is to find out how a person performs when compared with a standard . The most common and formal type of performance appraisal takes place in the work setting. Formal performance appraisal systems are constructed with the understanding that performance evaluations represent meaningfu l distinctions among individuals that correspond to actual behavioura l differences (Wendelken & Inn, 1981 ). The overall aim of the appraisal is to remove the influence of extraneous factors from the evaluation process in order to focus solely on aspects of performance that are related to some specific c rite rion. 5 Although judgements may be made about an individual's performance on a regular basis, the accuracy and equity of this process is still unresolved. Organizations continue to express disappointment in performance appraisal systems despite advances in technology (Banks & Murphy, 198 5). It should be appreciated that even with the best intentions, it is unlikely that performance appraisals can ever be made completely objective and accurate. Issues such as validity, reliability , and bias remain major and persistent problems which often hinder or nullify the value of many performance appraisal systems . 2.2 The Criterion Before a performance appraisal can be conducted it is essential that an organization determines the nature of the dimensions on which distinctions about performance are to be made. This is referred to as criterion development. The criterion is a way of describing success. For example, the criterion for a shop retailer might be the monetary value of sales in a one-month period. A criterion for measuring a student' s success in a school subject might be the course grade. The criterion for measuring a dieter's success is most likely to be the amount of weight lost . However, defining "the criterion" is not always a simple matter. It has been a problematic area of Industrial/Organizational Psychology for many years (Landy & Farr, 1983). For this reason, no doubt, a large amount of research has been directed at determining the necessary "criteria for criteria". Blum and Naylor (1968), for example, compiled a list of fifteen characteristics they considered necessary and/or desirable for criteria. These undefined characteristics are as follows: reliable, realistic, representative, related to other criteria, acceptable to job analysts, acceptable to management, consistent from one situation to another, predictable, inexpensive, 6 understandable, measurable, relevant, uncontaminated, bias free, and discriminating. Unfortunately, there have been few attempts to refine these characteristics or develop operational definitions of the criteria for the criterion. Subsequently a numerous array of variables have been used to study the effectiveness of performance appraisal data. This inconsistency in criterion development places doubt on the use of some performance appraisal measures. Downey, Lahey and Saal (1982) have shown empirically that the operational definitions adopted for criteria will significantly affect the conclusions drawn in the assessment of appraisal data. In a comparison of the psychometric characteristics of ratings from graphic and mixed-standard scales, it was found that the use of one set of operational definitions for rating error produced results that differed from the results obtained when another set of operational definitions was adopted. Their study illustrates the need for researchers and practitioners to thoroughly scrutinize the criteria they select for assessing appraisal data. This is best achieved by considering several of the essential requirements for a criterion. The first requirement of a criterion is that it be relevant to some important goal of the individual, the organisation, or society (Smith, 1983). Determination of relevance, is however, a matter of judgement. Some group or person must decide which activities are most relevant to success. Once these activities have been identified, effort must then be directed towards developing psychometrically sound measures of these activities. The measure of a criterion should be, neither contaminated with irrelevant variance, nor deficient in terms of measuring the important objectives of the organisation and of the people in it. As well, neither the criterion nor the measure of it should be biased or trivial. Relevance, consists of two parts. One is the validity of the goal which is judged to be important. The second is the validity of the measure(s) of goal 7 achievement. This requirement is parallel to the requirement that a test be valid. Reliability is the second requirement of a criterion. The estimates of reliability may be grouped into three general classes : (a) measures of stability; (b) measures of equivalence; and (c) measures of internal consistency (Landy & Farr, 1983). A criterion measure must in addition be practical, available, plausible, and acceptable to those who use it (Smith, 1983). Once an appropriate criterion has been determined, a method of measurement needs to be chosen. Frequently referred to as a performance measure or assessment method. 2.3 Assessment Methods The techniques used to assess and measure performance can be grouped into three general categories: comparative , absolute, and outcome or results-orientated. (Landy & Farr, 1983; Long, 1986). Comparative techniques evaluate the performance of employees in a work group relative to each other, using paired comparison, a ranking procedure, or forced distribution. All procedures are highly subjective as the rater is given a great deal of latitude to infer what distinguishes levels of effective performance. Absolute or criteria referenced methods attempt to describe or evaluate the performance of an individual by reference to some standard or standards of performance, not in relation to other individuals. Techniques include the essay or narrative-type approach, graphic or trait rating scales, mixed standard rating scale (Blanz & Ghiselli, 1972), checklists, critical incidents (Flanagan, 1954), and behavioral anchored rating scales (Smith & Kendall, 1963). All these procedures have limitations, but they may be appropriate depending on the purpose for which the appraisal is conducted. 8 The final group of methods is those that are results-orientated. These methods concentrate on specific accomplishments and outcomes achieved as a result of job performance, rather than job behaviours. Central to this approach is employee participation, objectives being jointly agreed between superiors and subordinates, and standards established in advance as the result of discussion and negotiation (Long, 1986). One problem with this approach is that a high degree of inferential skills, management time and effort is required for the method to work effectively. This method has also been found to be unsatisfactory for complex positions (Gruenfield, 1981 ). 2.4 Types of Data In conjunction with the method to be used in a performance appraisal, the type of data to be collected needs to be determined. There are several kinds of data that can be used to provide the necessary information. Guion (1965) identified at least three different types of measures of job behaviour: objective data, personnel data, and judgmental data. Ideally, Landy (1989) states that a complete performance measurement should include a combination of all three of these indices of performance, as the multi-dimensionality of job performance only becomes apparent when these categories are considered simultaneously. This advice is rarely put into practice. Both objective data and personnel data can be problematic. Generally the recording of this information is either not done correctly, or is unable to be done adequately enough for the resulting information to be useful, valid, or reliable. This does not imply that objective or personnel data have no value as criteria, but rather, that if they are to be useful, a careful analysis of the relationship between the elements of the job as identified 9 by the job analysis and elements of the behaviour as related to performance appraisal is necessary. Judgmental data is the most frequently used form of measurement. Landy ( 1989) reported that a literature review of validation studies in the Journal of Applied Psychology between 1965 and 1975 revealed that ratings were used as the primary criterion in 7 2 % of the cases. These judgments can take several forms. They may be a simple comparison of one employee with another, a list of statements which are applied to each employee, or some form of rating by which the employee is placed on a continuum depending on their level of proficiency. 2.5 Rating Scales The most widely used performance appraisal method is the judgemental rating scale (Long, 1986; Baker, 1988; Leap & Crino, 1989). Rating scales can be distinguished from one another on three different dimensions (Guion, 1965). The first dimension is the degree to which the meaning of the response category is defined. This deals with how the rating scale is marked off into units. Here a number of important decisions need to be made, the first is how many points the scale should contain. Previous research on the use of rating scales indicates that the optimal scale should include four or five points. Reliability drops with three categories or less, and there is little increase in reliability when there is more than five points (Lissitz & Green, 1975). Further , when deciding on the number of scale points, the organisation must decide whether they wish to permit central, uncertain, or undecided responses , which can occur with an odd-numbered scale (Jacobs, 1986). The second dimension is the degree to which the person interpreting the scale can tell what response was intended by the ratee. This is referred to as response clarity and is largely determined by the structure of the 10 scale. The third dimension is the degree to which the performance dimension being rated is defined for the rater. Whenever possible, verbal descriptions should accompany the numerical scale (Jacobs, 1986). Scale anchors that are defined precisely are less open to misinterpretation and therefore give the rater a reasonable idea of what performance dimensions are being considered. It has also been suggested that points on a rating scale should be reviewed between raters to ensure that there is agreement concerning what each point means in terms of actual performance behaviour (Leap & Crino, 1989). 2 .6 Rating Error In spite of the different forms and widespread use of judgmental indices of performance there has been a consistent dissatisfaction with these measures on the part of the researcher and practitioner. This dissatisfaction can be largely attributed to three types of rating errors - halo, central tendency, and leniency (Anastasi, 1 982). Other errors also contribute to the contamination of performance ratings, they include contrast, first impression, and spillover effect. Halo error, named by Thorndike ( 1920), occurs when a rater has a generally favourable or unfavourable impression of the person rated. This influences the rater in such a way that ratings are assigned which are consistent with that impression. The effect of this psychometric error is most exaggerated when multi-factor ratings are required. No method has been devised that effectively eliminates halo errors, and research on alternative solutions still continues (King, Hunter & Schmitt, 1980; Landy, Vance, Barnes-Farrell & Steele, 1980). The tendency to overweight in an appraisal any information and/or observations made on a person early in the appraisal period, is labelled first impression error (Latham, Wexley & Pursell, 1975). This 1 1 judgemental error is thought to be related to halo in that first impressions may facil itate or in fact be synonymous with the development of a positive or negative halo impression about a person. Other rating errors are the result of inappropriate rating patterns, due to the rater ' s failure to make necessary and appropriate distinctions among the performance levels of different individuals (Leap & Crino , 1989) . They include central tendency error which is generally defined as the "bunching up" of ratings at or near the middle of the scale owing to raters unwil lingness to assign extreme ratings. Since many indiv iduals do perform somewhere around an average , it is an eas ily rationalised escape from making a valid appraisal (Henderson, 1984). Leniency error , refers to raters who are unusually harsh or unusually easy in their rat ings. This results in ratings being bunched up either towards the lower or upper end of the scale. Both leniency and central tendency errors reduce the effective width of the scale and make ratings less discriminatory (Anastasi , 1982). One suggestion that has been offered to eliminate these errors is a forced distribution in which the rater is required to allocate a given percentage of the ratees ' to each category (Landy & Farr, 1983). However, the forced distribution assumes that there is some knowledge of what the distribution should look like, in most circumstances this assumption is probably untenable. Better scale development has also been suggested as one means of reducing leniency error . In particular, reducing the ambiguity of the scale by improving the definitions of the dimensions of the scale. Finally, attempts to eliminate rater errors have often focused on training raters to be aware of these tendencies. 12 2. 7 Rater Training Skill is required to appraise performance, therefore it makes sense to train raters. As Allinson (1977) observed, if an appraisal system is to function effectively, all members of an organisation should be educated about how to use the particular rating form, and its purpose . There has been a considerable amount of research on rater training (see reviews by Spool, 1978; Bernardin & Buckley, 1981 ). Traditionally, an important part of rater training has involved a description of the traditional rating errors of leniency and halo and suggestions on how to avoid them (Landy, 1989). However, Bernardin and Pence's (1980) study found that raters who were only trained in avoiding rating errors adopted a "set" or a cognitive control mechanism that was geared towards producing ratings that had certain statistical properties rather than ratings that described actual behaviour. A more effective rater training method proposed by Murphy, Martin and Garcia (1982), is to train raters in observation skills rather than in the ways of avoiding rater error. Their proposal is based on the assumption that raters who are more accurate observers are also more accurate evaluators. Stimulus Preprocessing Instructions Training Prior Exposure to Rating Form Observation Stimulus Short-Term Long-Term Retrieval Synthesis/ - Categorization i-- Memory - Memory i--- ~ Judgement/ Rating I I Figure 2.1: Cognitive Components in Rating. Source : Landy & Farr (1980) . 13 Accurate evaluation depends on accurate perception. However, even with extremely competent raters, it is unlikely that performance appraisals can ever be completely accurat e. A major reason for this is that humans have limited information processing capabilities {Feldman, 1981). Consequently research has focused on why some, as opposed to other, appraisal information is attended to; whether information is stored in long or short term memory; how information is organised in memory; and how it is retrieved and combined for decision making. A model of the relationsh ip between these cognitive components is presented in Figure 2.1. It is an elaboration of the cognitive section {ie. , observation /storage and retrieval /judgement) of Landy and Farr's (1980) proc ess m odel. The hope is that as the sequence of events that occur in performance appra isal becomes better understood, fair and unbiased appraisal systems will become commonpl ace {Fe ldman, 1981 ). Until then , rater accuracy may be best achieved using some type of regular refresher train ing to sustain appra isal skills (lvancevich, 1979) . 2.8 The Process Model The process of performance rating is incredibly complex. There are many opportuniti es for ratings to be influenced by factors other than th e performance of the person rated. Bernardin and Beatty (1984) have suggested that the complexity of performance appraisal is best represented by Landy and Farr ' s (1980) process model of performance rating. This model, shown here in Figure 2 .2, purports to describe the task of performance rating from a process perspective. When considering this model it is important to remember that the goal of performance rating is to provide an accurate performance description of the person in question. In Landy and Farr's ( 1980) model this is represented by the box labelled "Performance Description". Most of the other boxes might 14 be thought of as "potential obstacles to accurate performance description" (Landy, 1989, p. 147). The model attempts to define the specific subsystems and their interactive effects that form the larger rating system . Each component of the model has a research history which may be said to justify its inclusion in the model. I Position Organiz.ation Characteristics Characteristics I I Purpose for Rating Scale Rater - Development - - Characteristics -- - Rating I Rating Ra tee - Process I ...... Instrument - - Characteristics ....... I I Observation Retrieval "- " Data Performance Personnel Analysis Description Action Storage Judgement Figure 2.2: The Process Model of Performance Rating. Source: Landy & Farr ( 1 980). Although the model does not offer much in the way of explanation concerning why these elements may have adverse effects on the accuracy 15 of performance description, it does present a reasonable view of the complexity of the process. Note however that the rating process is not developed in isolation, it will inevitably be influenced by the purpose of the rating and the instruments used for the rating. 2.9 Performance Appraisal within Education The development of a fairer, more appropriate, and consequently better performance appraisal system has generally been conducted within the realms of the business sector. Nonetheless, performance appraisals are conducted in many other facets of life. Goal Workplace: Education: Work performance Aquisition of knowledge Conceptual problem of whether goal is accurately reflected by the criterion Criterion Workplace: Attitude, Quality, Output, etc Workplace: Rating Scale Performance is assessed. Problems with assessment method, procedure, and/or assessors may occur. Measurement Education: Essays, Exams Labs, etc Education: Grade Figure 2.3: An illustration of the similarities between Education and the Workplace in Judgemental Ratings. 16 In education performance judgements of students are a regular occurrence. They are included under a multitude of other names, such as oral examinations, laboratory tests, internal assessment, and final exams. It is therefore disappointing that the methods to improve the faults highlighted in the performance appraisals' conducted within business, are often not used or acknowledged in education (Bee & Dolton, 1985; Foster, 1985; Heywood, 1989). The relationship between the concepts presented previously with respect to performance appraisal in the work place and performance appraisal conducted within education are graphically presented in Figure 2.3 . In postgraduate education an absolute method of performance appraisal, the rating scale is used . For honours degrees the final assessment is the class of honours received. Problems within the rating scales used to measure work performance are just as evident in measures of student assessment. The raters are just as susceptible to halo, leniency, central tendency and other forms of rater error and bias . Consequently, all of these factors can affect the reliability, validity, and accuracy of the performance appraisal of student performance. Chapter three looks at issues relevant to performance appraisal in postgraduate education. CHAPTER THREE ASSESSMENT IN POSTGRADUATE EDUCATION 3. 1 To Grade or not to Grade 17 Society usually expects that the education system will select, sift, and categorize individuals (Heywood , 1989). Although it is possible for a person without a tertiary education to be successful, education is usually important for success. Education also aids social development, teaches society's values, and strives to instill a quest for knowledge into students (Birt, 1985). It is however, the grades that individuals obtain that directly affect their future . In education grades are used to make judgements about students. They are an important standard for admission to higher education. They can determine the university attended, and the subject studied at that institution. In New Zealand this occurs as a direct result of restrictions placed on the number of students that are accepted into some courses. Grades can also limit the level to which a person is allowed to progress in their chosen subject (Heywood, 1989). It is not only within education that grades are important . In the work force employers generally believe that grades provide valuable information. Consequently for the graduate their final grade can be important as a means of obtaining employment, and the remuneration package they receive (Dolton & Makepeace, 1990). Grades may also be addressed in relation to the performance of universities. In line with today's imperatives of relevance, efficiency and accountability, universities are now required to look more closely at what they are doing and how much it costs to do it. In Britain this is highlighted 18 in the recent Green Paper on "The Development of Higher Education into the 1990's" (Department of Education and Science, 1985), and in New Zealand via several reports on education (Boston, 1988). With the need to develop appropriate performance indicators, Smith (1990) noted that grades awarded to students are being used to judge performance between institutions. Some people nevertheless, do not favour the use of grades. They maintain that they are ineffective in assessing the value of the educational process (Powell & Butterworth, 1971; Wainwright, 1977). Fawthrop (1968) for example believes that examinations are a constraint on education: "From the educational viewpoint examinations are a supreme form of alienation in the modern world. This is also relevant to the teaching sphere, in which the genuine aims of the tutor are periodically subverted by the exigencies of the system, which emphasize that the first obligation is to get them through the examination at all costs rather than to stimulate a relevant contribution to the advancement of learning. One might well ask what does a society profit if it gains a whole world of degrees and yet loses its own educational soul? In a world of ignorance what can we give in exchange for true knowledge - a million scraps of paper certifying student degree status?" (p. 24). Most would agree that what Fawthrop believes has some truth. Students aiming for high grades and teachers trying to cover the curriculum, can result in original thought being dismissed. It is generally acknowledged that assessment is not infallible (Cox, 1967; Klug, 1976; Johnson, 1988), but most people believe that some form of assessment of students abilities must be made. Grades and examinations are as much a part of life within education as are the judgements and assessments made by 19 people in all other parts of life. It is therefore of the utmost importance that examinations and the resulting grades should be both valid and reliable . 3.2 Assessment Methods In recent years the methods and timing of assessment of students performance have become more diverse. The final exam is still a component of most courses. However the majority of university courses do not rely on the f inal exam alone. Internal assessment now forms a large proportion of a student's final grade (McKay, 1984). Several of the methods in which students are assessed before the final exam include, essays, tests, oral examinations, seminar presentations, laboratory reports, and mastery of practical skills. The final exam has also incorporated several new methods of measuring acquisition of knowledge. In the United States multiple choice questions are now the predominate means of assessing students in their final exams (Foster, 1985; Heywood, 1989). Short answers and paragraph answers are also in use, as well as the traditional essay questions. The aim is to cover a greater range of the curriculum. Deciding which method(s) best measure whether students have acquired a satisfactory level of knowledge in a given course is a complex task. One wants a device that samples the whole range of educational aims, provides grades which are fair and stable over time, and which different examiners can use consistently. One does not want a technique likely to be contaminated by extraneous factors such as a student's sex or name, which assesses a limited subset of the educational aims, or which is unfairly biased toward some students and against others (Foster, 1985). This research considers grading and assessment methods, their value in the educational process and their impact on the individual. Teaching 20 knowledge alone is no measure of the value of an education an individual acquires. "If we are to improve learning, we will have to improve the methods of testing and learning we use. They will have to become intimately related" (Heywood, 1989, p. 2). 3.3 Assessment Reliability The reliability of assessment in higher education has been the subject of concern for some years (see Dale, 1959; Foster, 1985; Silver & Silver, 1986; Johnson, 1988). It is complicated by the fact that assessment procedures differ widely, not only between counties, universities, and faculties, but also between departments, and within departments. As a result there have been pleas for greater consensus in course objectives, and for them to be more explicitly presented (Johnson, 1988). For many years psychometric experts have commented that the unseen essay examination, the most frequently used assessment technique in British higher education (Hewton, 1987), and one of several techniques used in New Zealand (McKay, 1984), possesses many disadvantages. Students' grades are affected by factors such as the quality of their handwriting (Marshall & Powers, 1969), inconsistent marking (Cresswell, 1986), the exam sampling a very limited section of the student's knowledge (Johnson, 1988), and grades being unstable over time (Foster, 1985). In postgraduate education there are two areas where reliability is important. The reliability of the measurement device, and the reliability between markers in their assessment of an individual. Both have been extensively researched because they are fundamental to assessment practice (see Cox, 1967; Miller & Parlett, 1974; Bell, 1980; Heywood, 1989). 21 Assessment reliability is also contingent on the particular postgraduate qualification undertaken. A Bachelor with honours candidate is partially protected against a marked fluctuation in the standard of the papers by the number of papers they take , provided that compensation is allowed from one paper to another. Similarly, the number of markers is some protection against a change in the standard of the marking, and against any unreliability of marking (Dale, 1959). It has recently been proposed however that the averaging of students performance over a number of papers is no guarantee of a reliable assessment outcome (Johnson, 1988). This may be the case, particularly where the component with the lowest reliability carries a relatively high weight in aggregation of marks . For example the thesis in some Masters programmes (for example Masters of Arts or Masters of Science) can be a large proportion of the final grade. It is now over thirty years since Dale (1959) addressed the reliability of university grading, and although his work has been cited many times it appears that very little has changed. Dale (1959) suggested that the biggest obstacle to the reform of unreliable university examinations, was the ignorance of the university staff with regards to the pitfalls for the examiner. He stated that "the calm assurance with which lecturers and professors alike believe that they can carry around in their heads an unfailingly correct conception of an absolute standard of the pass line is incomprehensible to anyone who has studied the research on the reliability of examinations" (p. 186) . Research has continually shown that examiners not only differ with one another, but that any one examiner will disagree with their own assessment of a particular piece of work on a different occasion (Cresswell, 1986). Further it is incorrect to assume that examiners share 22 implicit notions about standards, and that they consistently allocate grades with the same degree of severity or leniency (Johnson, 1988). Human error also compounds the problem of marker reliability. Marker errors are more easily reduced when the work being assessed is either right or wrong, and no subjective judgement needs to enter the decision about how to mark a question or examination script. Nonetheless, it is well known (Brooks, 1980; Heywood, 1989) that one way to reduce errors in marking is to employ more than one marker and take the mean of their marks. Another method of improving marker reliability is to have marking schemes (Foster, 1985; Johnson, 1988). At the postgraduate level however, marking schemes are not always appropriate because of the nature of the work assessed. Markers can also be unintentionally biased in their marking, just as bias can effect other aspects of higher education. 3.4 Biases in Assessment Higher education is exclusive. In any one year although thousands of eighteen-year-olds enter universities to begin courses in higher education, many more do not. A small number wish to go but are positively excluded, either because they do not reach the minimum level for entry or because places are offered to better qualified applicants. A far greater number are excluded because their previous education denied them the chance or even the ambition to consider higher education (Burgess, 1981 ). Most people believe that in admitting people to higher education there is no desire to exclude on any but objective academic grounds . If there are too few places, then the better qualified will be admitted. On examination however the population in higher education is not representative of the adult population as a whole. In New Zealand those who are successful are overwhelmeningly young, and from this group they are predominately white middle-class men (Jones, 1982). Within higher 23 education there are effective biases against age (Woodley, 1984), class (Williamson, 1981), disability (Sturt, 1881), religion (Gay, 1981), race (Little & Robbins, 1981) and sex (Spender, 1981; Acker & Piper, 1984). These biases prevent people from entering higher education. In some cases they prevent individuals from obtaining an adequate education at all. Once a person has entered higher education, there is no guarantee that the biases that hampered them from entering these institutions will subsequently not affect their grades. 3.5 Sex Bias The evaluation of students' work is supposed to be objective and merit based. However the evaluation criteria for assessing students' written work are highly ambiguous and the marking process is known to be unreliable (Hartog & Rhodes, 1935; Dale , 1959; Robbins, 1963; Cox, 1967). There is a high level of inference required to evaluate students' written work, and therefore it is often stated that biases, including sex bias, would be expected to occur under such conditions. Most of the studies of sex bias in evaluation have examined the hypothesis that when both sexes have identical qualifications or performance, men are evaluated more favourably than women. Although many studies have demonstrated this pro-male evaluation bias (for example, Lao, Upchurch, Corwin & Grossnickle, 1975; Gutlek & Stevens, 1979; Sharp & Post, 1 980), some studies have found no sex bias (Hall & Hall, 1976; Dipboyd & Wiley, 1977; Frank & Drucker, 1977), and others have demonstrated a pro-female evaluation bias (Jacobson & Effertz , 1974; Bigoness, 1976) . Nieva and Gutlek (1980) reviewed the literature on sex bias in a variety of situations and suggested that the degree and pattern of bias depends on three factors: 24 1. Level of Inference: sex bias tends to operate where there is ambiguity concerning evaluation criteria. 2. Sex Role Incongruence: Sex bias tends to occur when the tasks undertaken are deemed to be more appropriate for one sex than the other. 3. Level of Performance: the operation of sex bias appears to be affected by the level of qualification or performance involved. These factors suggest that the grading in universities could be sex biased. Particularly in postgraduate education where evaluation criteria are often ambiguous, and in subjects where the essay exam format is prevalent. These assumptions can be supported by experimental studies which have shown that identical essays get higher marks when a male rather than a female name is attached (Wallston & O'Leary, 1981). Women tend to be evaluated less favourably than men when both men and women are highly qualified or perform well (Bradley, 1984). Generally subjects offered at universities are classified as either male orientated or female orientated, and the level of qualification assessed is generally considered high. Therefore if women were assessed unfavourably at university this would lend support to Nieva and Gutlek's ( 1980) study. So, in practice do results indicate that sex bias operates in universities? A study by Bradley (1984) addressed this issue. The study was designed to exclude the possibility that it was differences in the abilities of both men and women. This is the reason generally given for any differences in the distribution of examination marks between the sexes (Dale, 1959; Murphy, 1982; Rudd, 1984). Results indicated that markers who were familiar with the student being marked were not biased in their marking, but markers unfamiliar with a student were biased. In discussion it was noted that it may therefore be the case that the risk of sex bias may be greater in large departments due to the small amount of staff-student 25 contact. When determining the occurrence of sex bias, the sex of the examiner is of less importance than the traditionality of the examiner, as both men and women examiners are exposed to the same cultural stereotypes and expectations of sex-role appropriate behaviours (Bradley, 1984). Thus both male and female markers can be influenced by the sex of the individual being evaluated (Nieva & Gutlek, 1980). Determining whether sex bias, or indeed any form of bias, exists is possible. However, the detection of bias is not a matter of simple observation. There is no support for the opinion that examiners are aware of any biases they themselves contribute, nor is there any reason to expect that examiners are aware of the biases contributed by their colleagues or that they will be able to take steps to make it ineffective (Bradley, 1985). 3.6 Conclusion Like all appraisals, performance appraisals in education of postgraduate students' acquisition of knowledge and skill are fallible. As this chapter has shown, questions have been continually asked about the appropriateness of the methods of assessment used and it is well known that these methods are not always reliable. Further, their validity is complicated by the biases that operate both within education and by markers. How severely these inaccuracies of the education process affect the final grades awarded to students is the subject of the research reported here. CHAPTER FOUR THE UNIVERSITY SYSTEMS 4. 1 Universities - Their Purpose 26 Universities are among the oldest institutions in Western society. Their long history shows how they have developed and changed in response to peoples' insatiable desire for knowledge, and society's need for advanced thinking and skilled workers. Originally the word "universitas" meant a whole body of masters and students in a community, working together to seek truth through instruction, debate and research (Gibson, 1978). Today, universities are structured very differently, and the purposes of universities are more diverse. Consequently academics, are frequently drawn into discourse as to what the purposes of universities are, and whether present systems are successful in fulfilling these aims. On the one hand governments, industry and students urge a "vocationalism" upon the universities which finds expression in labour market trends (Birt, 1985). Highlighted by the demand for courses in Accountancy, Technology, and Computer Science. Yet others believe university students should also be encouraged to pursue truth, knowledge and understanding, to develop intellectual exploration and the free exchange of ideas (Ball, 1985). That teaching in universities should focus on old ideals and the notion that postgraduate study is a preparation for a life of scholarship and admission to an academic community (Blume, 1986). No doubt the debate as to universities' actual priorities will continue, but at present perhaps it is best to concede, that regardless of their specific purpose, a university education is now something that is becoming more and more common. The situation in New Zealand is no exception. 27 4.2 New Zealand Universities - The Beginnings On the thirteenth day of September 1870 the Act of the General Assembly was passed signalling the beginning of university education in New Zealand (Parton, 1979) . Since that day, over one hundred and twenty years ago New Zealand's university system has undergone many changes in the structure, operation, and funding of universities. Changes in the composition of the students, the courses they pursue, and the way they are assessed are also evident. The first university in New Zealand was established in Dunedin, a year prior to the Act of the general assembly by an ordinance of the Provisional Council of Otago. However, this university, latter to become known as Otago University did not have the authority to grant degrees (Bell, 1981 ). As a result of the 1870 Act the University of New Zealand was founded, and granted the right to confer degrees. In an effort to ensure the acceptance and international recognition of the degrees awarded, an early decision of the University of New Zealand was that examinations should be set and marked by eminent academics in the United Kingdom (Yearbook, 1990) . Once established, this policy proved hard to alter and continued to have a significant impact on university teaching, restricting initiative and change for many years. Finally in 1939 it was agreed that professors in New Zealand should be the examiners for a stage three subject. The commencement of World War two, and the possible risk of examination papers being lost or delayed on their way to the United Kingdom, then ensured that this reform continued and an increasing number of New Zealand examiners were appointed. In 1961, the federal University of New Zealand was abolished and the universities in operation at that time, Auckland, Victoria, Canterbury, and Otago, become autonomous entities. A link between the universities and 28 government was established by the introduction of the Universities Grant Committee. Latter in 1963 separate Acts of Parliament established the universities of Waikato and Massey. Since then university education within New Zealand has expanded to include seven universities. The last of which Lincoln University (previously known as Lincoln College and associated with Canterbury University), obtained University status in 1990 (Yearbook, 1990) . On July 1st 1990, the Universities Grant Committee was abolished under the provisions of the Education Amendment Act 1 989 (Hall, 1 990). 4 .3 The Present New Zealand University System All the universities in New Zealand are divided into faculties and departments except for the University of Waikato, which is divided into schools. Students may undertake a course of study either on a full-time or part-time basis. Additionally, Massey University offers many courses of study through distance education. Prior to 1988, to be eligible for entry into any university course of study, an applicant was required to have successfully passed the University Entrance exam, in at least four subjects including English. Since then, entry to university has been determined with reference to a students Sixth Form Certificate points classification. It is required that 12 or less points are accrued over fours subjects. However, most students complete a seventh form year, and entry is determined on Bursary examination results . Provisional entrance may also be granted to students over the age of twenty one years who do not have the minimum qualification. Some courses however have restricted entry due to there being more candidates than places. Preference is usually given to students with the best examination results in specified subjects after their seventh form 29 year of study at secondary school, or at the end of their first year of university study (intermediate year). The main, and usually first, stage of university education in New Zealand universities leads to the Bachelors degree. The length of this course of study differs from faculty to faculty, but typically a Bachelor degree requires three years of study for Arts, Science, Horticulture, Agriculture and Commerce; four years for Engineering, Horticultural and Agricultural Science; five years for Architecture, Veterinary, Dentistry and Law; and six for Medicine. A Bachelor with honours degree usually requires an additional year of study . The second stage of university education leads to the Masters degree. This is usually obtained in one to three years, and can be awarded with honours or distinction. The Masters program usually entails course work, a thesis, or most commonly, a combination of the two. Typically the third stage of university education is a Doctor of Philosophy, obtained after a minimum of two years supervised research and a presentation of a thesis. Doctorates of Literature, Science, and Law are the most advanced degrees of the university system and they are awarded for exceptional advanced research, or as honourary degrees to those in the community deemed to deserve them by the universities. 4.4 The University System of England and Wales The university system, with which New Zealand is most frequently compared is that of Britain. This is generally because up to date British statistics are available, New Zealand universities are staffed by some academics with first hand experience of British universities (New Zealand University Conference, 1969), and because with few exceptions, the British system of education more closely resembles our own than any other (Pool, 1 987). Therefore the present study will research whether 30 there are similarities in the distribution of grades for honours students from these two university systems. A considerable amount of research has already been conducted in Bri tain regarding the equivalence of grading standards between the sexes, institutions, and faculties of British universities. These studies will be discussed in Chapter five. A comparison of this nature lacks validity unless it takes into account the different circumstances operating within the university systems of the countries compared. Entry to university in both England and Wales normally takes place after a minimum of 13 years of primary and secondary education. To be eligible, an individual requires a certain number of passes in the General Certificate of Education examinations (GCE) at both the 11 0" level and "A 11 level. The first degree of higher education, the Bachelor degree, is usually awarded after three years of study, but this varies between faculties and can be as long as six years. There are two types of Bachelor degrees. The first is the honours or special degree, the second is the ordinary or pass degree generally awarded to those candidates who have studied for an honours degree but whose results do not justify the award of honours (UNESCO, 1980). For the final examinations, universities not only appoint examiners from their own teaching staff, but also call in the services of a number of external examiners from other universities (Williams, 1979; Piper, 1985). In this way whilst preserving the autonomy and character of each individual university, the universities also try to maintain an equivalent standard of achievement throughout the country. However, comparability of standards in England and Wales has not always been maintained by the external examiner system. 31 4.5 Standards in the British University System The problem of addressing fairness, that is , the maintenance of equivalent standards both between and within universities has been a reoccurring one within the British university system. In Britain , standards were first maintained by controlling the institut ions which were empowered to award degrees . At the beginning of the nineteenth century Oxford and Cambridge were exclusive with regards to social-class and religious denomination. Then with the creation of London University and later the provincial colleges the notion of standards was more implicit in the discussion of institutional hierarchies (Silver & Silver, 1986). Later the various roles of London University in particular , addressed the issues of standards , their definitions, and guard ianship. In 1858 The Charter permitted colleges t o prepare students t o sit the London Univers ity external exam (Silver & Silver, 1986). However, the external degree soon raised questions about the appropriateness and justice of examinations divorced from teaching. While the separation looked attractive as a guarantee of objectivity, students faced examinations whose standards were based on criteria often unrelated to or in conflict with those of the teaching colleges. In the beginning of the twentieth century, the external examiner system became crucial to the concept of examination standards. By the 1960's further development meant that new meanings were being sought for the concepts of standards, quality and excellence. Concepts which had once seemed absolute and measurable. Christopherson (1967) suggested that the maintenance of standards meant ensuring that students on completion of their course had some familiarity with the basic ideas in a particular field of study, some experience of living and working with people of similar ability in other fields of study, and were 32 at least equal to others who had done the same course in earlier years. However, this was becoming more difficult to achieve as higher education continued to expand. In addition, the new generation of lecturers in the free speech society of the 1960's were instrumental in introducing a number of significant changes. Examinations were now being set by those who taught the candidate, with the external examiners continuing to play a supervisory role to ensure that standards were met. This ensured that papers reflected the material covered in the course (though not necessarily the syllabus) and thus removed some chance effects. However, it increased the chance of poor quality questions, and reduced the level of consistency of standards between colleges and between years of any course (Gaskell, 1979). As previously discussed, the reliability of examinations and marks had already been challenged, most notably in Britain by Hartog and Rhodes (1935). Various techniques proposed to improve marking reliability did not silence anxieties about differences amongst subjects, and within subjects (Cox, 1967). Dale in 1959 castigated staff for their ignorance of the pitfalls of examining, and their belief that they carried in their heads an absolute standard of 40 percent. He pointed to the wide disparities of first class honours awards in different subjects, ranging from 1 /4 in Applied Science to 1 /70 in Arts (Dale, 1959). What was being discovered was "the complexity of the assessment task" (Miller & Parlett, 1973). By the 1980's the same reservations were appearing with regards to the role of the external examiner, whose presence did not appear to guard against arbitrary differences, and whose experience of "comparability" was questionable (Silver & Silver, 1986). Yet the external examination system is still held up as one of the major guarantees of quality and equity within 33 British higher education (Williams, 1979; Piper, 1985; Connolly & Smith, 1986; Johnson, 1988) . 4.6 The Brit ish External Examination System Very little has been published on the role of external examiners, and until recently no systematic survey of their work has been undertaken. Unfortunately, the results are not encouraging, the external examination system is not very effective in guaranteeing equivalence of standards between universities. Piper (1985) asked external examiners to outline their role in this capacity. The most commonly reported role was that of being an additional marker for borderline candidates (86%). Being an additional scrutineer for exceptionally good or exceptionally poor work was reported in 70% of the cases. Similar figures were found for arbitration when internal examiners failed to agree on a mark. The role of ombudsman was not common, 10%, as it was thought that other resources were open to students who fe lt they had been unfairly treated. In comparison, institutions saw their external examiners as having the function of checking standards . They did not perceive the recommendations of their external examiners as moving them towards the centre, rather most institutions saw their external examiners as either sanctioning the present state of affairs, or else encouraging them to award more top grades (Smith, 1990) . Williams (19 79) states that the purpose of the external examiner " is generally understood to be the maintenance of similar standards between differen t universities" (p. 162). Yet the question of standards is not st raight f orward . There are at least four forms of consistency or equality w hich need distinguishing: 34 1. The maintenance of standards from year to year in a give course . 2. The monitoring of equivalence between course options. 3 . The parity of standards between universities within subjects. 4. Parity between different subjects for nationally recognised levels of accreditation. It is apparent that a clear understanding of the role of external examiners is neither manifest or practised . Institutions' reliance on external examiners to ensure fairness and comparability seems naive, when the external examiners themselves fail to see this as one of their major tasks in the external examining system. This would indicate that the system needs to be rethought and objectives need to be defined more carefully. Failing that perhaps other means of addressing equity may need to be considered. In England and Wales comparability is too serious an issue to be dismissed by complacent references to the external examining system . In New Zealand, although there is no "appointed" national body to ensure equivalent standards, comparability is equally as important. Degree class has too great an impact on the future lives of students for scant attention to be paid to this matter (K linov-Malul, 1974; Johnson, 1988; Dolton & Makepeace, 1990). It has been argued that unless standards can be maintained, the ability to compare students, courses, and institutions becomes highly questionable. It can be equally proposed however, that it is only through comparative stud ies of this nature, that questions concerning the va l idity and reliab ility of the existing standards can be made. Chapter five discusses several studies that have addressed these questions. 5.1 Introduction CHAPTER FIVE HONOURS STUDIES 35 Degrees with distinction provide their holder with opportunities for further advancement within higher education and in the labour market generally. Therefore any other factors apart from ability and knowledge that might improve opportunities to obtain a top degree are extremely important. Several studies have addressed the impact of subject studied, institutional characteristics, and gender on students degree performance. 5.2 Gender Studies A study by Rudd (1984) sparked a great deal of debate in Britain about the pattern of honours degrees awarded. Rudd's research examined honours degrees awarded to men and women in British universities during 1967, 1 978 and 1979. He reported that women gained a lower percentage of both first class degrees. and the lower honours degrees compared to men. After discounting a number of plausible explanations as to why this might be the case, he concluded that "the only explanation that seems to fit all the facts is that this difference is linked to differences in the distribution of ability as measured by the scores gained in intelligence tests" (p. 4 7). Support for this explanation was credited to Heim's ( 1970) study which suggests that women's test scores give a distribution of measured intelligence which is slightly different to that of men, with a smaller percentage at the extreme ends of the scale. Rudd (1984) also looked at the differences between the sexes in obtaining a "good" degree, that is a first class or upper second honours degree. His results showed that women performed better than men in Education, Medical subjects, Engineering, Agricultural subjects, Social studies, 36 Architecture and other Professional studies groups. Men performed better in Arts and Language subjects, even though these are two areas in which men are under-represented. It is perhaps not surprising that Rudd's ( 1984) research was controversial, but this was not due to his results, which generally have been supported by other British studies (Jones & Castle, 1986; Kornbrot, 1987; Clarke, 1988), but rather because of his explanation for the results obtained. In 1988 Simon Clarke reevaluated Rudd's ( 1984) study and suggested that Rudd overestimated the tendency for men to achieve a disproportionate number of first and third class honours degrees, and that he failed to pay sufficient attention to the marked differences in performance as a function of the subject studied or to the change in relative performance over time. Clarke ( 1988) found that in general, women did better in Professional subjects and Chemical and Biological Sciences. Men did better in the Arts, Mathematics and the Physical Sciences, and the sexes performed at the same level in Social Sciences. Women still underachieved at the first class -level, and men still tended to get more third class degrees, but Clarke ( 1988) suggested that these factors were often linked to the area of study. Acknowledging the differences between the sexes that Rudd (1984) had reported Clarke ( 1988) also questioned why males and females were disproportionately represented with respect to classes of degrees. He rejected Rudd's (1984) explanation on the grounds that IQ tests are not a valid measure when assessing differences between the sexes. A crucial aspect in the design of intelligence tests, is that the test not be biased in favour of either sex. As IQ tests have developed, items that have shown consistent differences between the sexes have been excluded. 37 Due to this fact all attempts to show sex differences in ability by use of intelligence tests are invalid (Ryan, 1972). After reviewing the evidence Clarke ( 1 988) proposed that the differences in the overall performance of men and women were the result of social and institutional pressures. He pointed to sex stereotyping, and biases in examining, supported by Bradley's (1985) research, as part of the explanation of men obtaining a disproportionate number of first and third class honours degrees. Clarke ( 1988) also suggested that there is a need to look at differences in the cultural and institutional framework that may exist which discriminate differently between men and women in different subject areas. In conclusion he stated that there have been positive changes over time, evident by the improvement in performance by women in all subjects except Arts, relative to that of men. One of the major advances within this area that Clarke's ( 1988) study appears to advocate, and which Rudd ( 1984) failed to acknowledge, is that differences between the sexes cannot be considered in isolation. An important factor is an individual's area of study. This has been supported by other researchers, for example Kornbrot ( 1987) who concluded from her study that gender differences in degree performance tend to depend on content area and topic. Kornbrot's ( 1987) study found women significantly more likely than men to achieve a competent degree of lower second or better in all disciplines, but like other studies men obtained more first class degrees. In particular men were substantially more likely than women to achieve first class degrees in the Humanities, Social Sciences, and Language and Literature areas. Women were more likely than men to achieve first class honours degrees in Medicine. The overall pattern suggested that women were highly successful in many disciplines which were strongly stereotyped as 38 male, and where they were currently under-represented. This raises the interesting question of whether a person's assessment is affected by their choice of study. 5.3 Subject Studies Regardless of whether a student is majoring in Physics, Accountancy, or French, a first class honours degree should require the same amount of ability, sagacity and effort on the student's behalf. Several studies have investigated this phenomenon and found discrepancies in the grades awarded between different subjects. Ascertaining the reasons for these differences however, has not been entirely successful. Neuman and Ziderman ( 1985) investigated the existence of differences in standards in awarding first degrees with distinction amongst universities in Israel. Considerable diversity was found in the tendency to award first degrees with distinction between and within universities and faculties, and between the major subject departments of the Social Sciences, which was selected for more detailed analysis. The results of their research may be particularly pertinent to New Zealand research as Israel's university systems parallels New Zealand's university system in several ways. Israel is a small country with six independent universities, all operating within the framework of a central Universities Grant Committee, modelled on the British pattern. (Note, the Universities Grants Committee ceased to exists in New Zealand from 1st July 1990 (Hall, 1990), however the research conducted in the present study extends only to 1989). An analysis of variance of first degrees awarded with distinction, by university and faculty in Israeli universities between 1979 to 1983 revealed that both the main effects and the interaction effect were highly significant. Neuman and Ziderman (1985) reported that Natural Science faculties tended to award more degrees with distinction than average 39 (coefficient of + 0.21 ), whereas Social Science awarded less (-0.19), and Arts faculties were on a par with the overall average tendency to grant degrees with distinction. In conclusion Neuman and Ziderman (1985) stated that "there is a pressing need for universities in Israel, as in England (and possible in other countries too), to set their houses in order through the framing of procedures for the maintenance of common standards in the granting of degrees with distinction, both between as well as within universities" (p. 458-459). The need to develop a method to ensure equivalence in standards is echoed by others. No less so than by Bourner and Bourner (1985) whose research explored the pattern of honours degree results in Accounting with those of other subject areas. Their results based on British data were in agreement with Neuman and Ziderman (1985). Individuals in the Science and Engineering/Technology subject groups were awarded, on average, a higher proportion of first class degrees than any other subject group. Specifically, the proportion of first class degrees awarded by both of these groups exceeded that of Accounting by a factor of seven. Note that Accounting was placed in the subject group Social, Administrative and Business studies, which in total received the smallest number of first class degrees. An older, yet frequently quoted piece of research that addresses the variation from department to department and from year to year in the standard of degree classes is that of Dale (1959). Dale's (1959) results also showed a greater proportion of first class honours degrees awarded to Science students compared to both Commerce and Arts students. · It would be naive to expect that class percentages for different faculties should be equal, however most researchers would be even more astonished if these results were a true reflection of the comparative ability of students from different faculties. Doubtless, individuals of different 40 major fields do differ in several ways. It has been shown that they differ in their personality traits (Elton & Rose, 1967), and in their scholastic strategies (Goldman & Warren, 1973). Nevertheless, Dale's (1959) study found no evidence from psychological testing of students from different faculties or departments in their ability that corresponded with the differences in degree awards obtained. Other studies have also failed to support the idea that variation in grades is due to the ability composition of students studying different subjects (Nevin, 1972; Rudd, 1984; Clarke, 1988). Although several studies have obtained similar results in the awarding of first class honours degrees between faculties, little discussion has been offered as to why this might be the case. Yet all researchers are adamant that standards should be more equivalent, and that methods to achieve this should be developed. Generally Dale's (1959) explanation of these results is accepted as addressing some of the discrepancies. Dale (1959) reasoned that the wide variation in degree standards from one faculty to another lay in the nature of the subject matter. Those subjects in which the mathematical content is high yield a much greater spread of marks than subjects such as English and History in which the essay type of answer predominates. Therefore Mathematics will award more firsts than English. Using this argument Mathematics should also award more thirds. 5 .4 The Student Population Throughout the world it is evident that employment in certain sectors of society are decreasing while other sectors are increasing. Specifically the number of people employed in the agricultural and industrial sectors are declining, while there is an increased need for individuals in the commercial and service industries (Yearbook, 1990). Therefore it is not surprising that these trends are reflected in the enrolment figures of 41 students in university courses (Blume, 1986; Fenner, 1989). However, other trends also exist, so that the changing structure of the student population, with regards to their choice of subject, is not a simple linear equation, between area of greatest employment and increased numbers enroled in the appropriate faculty area. The population of students in New Zealand universities reflects that of most overseas universities in that there has been an annual increase in the total number of students attending university, and that there has also been an increase in the number of students furthering their education by undertaking postgraduate education (Sub-Committee on Graduate Employment, 1988). The proportion of females attending university comes closer to approaching fifty percent of the total student population each year (Pool, 1987). American studies show that the general pattern of change for women students, is that they have increased their presence across the board in all fields of study (Roemer, 1983). Women have made decisive movements into fields in which they have previously not been well represented. At the same time women have accepted the basic patterns that were established in the 1970's, and continued to pursue studies in areas that are regarded as appropriate choices for women. In Britain the representation of women in Sciences and Engineering at all levels has shown a steady increase over the past two decades, although most women are still at the low levels, both. in terms of academic achievement and employment (Ferry, 1982). In the United States the same situation prevails (Fenner, 1989). Women's participation at the postgraduate level of education is still substantially less than that of men's in the United States (Roemer, 1983), Britain (Jones & Castle, 1986), and New Zealand (Taiaroa, 1985). The 42 reasons for this are complex, but a major contributing factor is that postgraduate degree enrolments are largely determined by the quality of the first degree and men still achieve more and better honours degrees than women (Jones & Castle, 1986). A further result of this is that men, due to their better grades, are more likely to be recipients of scholarships, and therefore to have greater access to postgraduate education (Jones & Castle, 1986). A further restriction on the entry of females to postgraduate studies is their predominance in the traditionally "acceptable" areas of study for women, that is the Arts and Education. For this reason, there are large numbers of women competing against one another for the limited number of positions, scholarships and grants available (Jones & Castle, 1986). So a closer look at higher education reveals that females have made definite inroads in relation to their participation in universities, however the rule of "the higher the fewer" still applies to women in almost every field of study (Ferry, 1982). This fact is further emphasised by a glance at the composition of university staff. In Britain, of the full time teaching staff in universities, only 10% are women. At the higher positions of readers, senior lecturers, and professors, 40% of men hold these positions compared to 18% of women, and they are usually represented in the faculties of Arts and Social Sciences (Ferry, 1982). Similarly, in Australian universities only 17% of the senior academic staff are women (Buckridge & Barham, 1984). 5.5 Institutional Differences In Britain several recent studies have considered whether graduates of one institution are comparable with graduates of other institutions. The studies of Bee and Dolton, (1985); Connolly and Smith, (1986); Johnes and Taylor, (1987); and Smith, (1990) have all shown that there is a 43 significant, and frequently large, variation in the degree classes awarded to students as a function of the university they attended. Several explanations for this variation have been presented. The most comprehensive of these studies was conducted by Johnes and Taylor ( 1987). Three significant relationships between the variation in degree results of universities and several student and institutional characteristics researched were found. They were A-level scores, proportion of students living at home, and library expenditure as a percentage of total spending. The mean A-level scores of a university's students was quite significantly related to degree results. A one point increase in A-level scores was associated with an increase of between three to four percentage points in the proportion of graduates with a first or upper second class honours degree. This finding differs from previous research (Wilson, 1981; Sear, 1983; Foy & Waller, 1987) which has found only a weak relationship between A-level scores and the prediction of class of degree. Further, the studies of Connolly and Smith (1986) and Smith (1990), which investigated the variation in degree classes in Psychology, also found that A-level scores were not able to predict class of degree. Universities with a high proportion of students living at home during the terms, were more likely to produce poor results than universities in which the proportion of students living at home was low. As John es and Taylor (1987) pointed out, the interpretation of this result is unclear since it can not be determined whether the proportion of students living at home describes the type of students a university acquires or whether it is indicative of characteristics which relate to the universities themselves. Similarly it is difficult to interpret why large expenditure spent on the 44 library was positively related with universities that awarded a higher than average number of good degree results. All studies concerned with variations in universities have attempted to measure whether these differences are a function of the quality of different universities, in particular the quality of teaching. However, it has been difficult to obtain a true measure of this factor. Connolly and Smith ( 198 6) considered the accessible statistics of staff-student ratio as a crude operationalised measure of quality of teaching. This measure has since been used by other researchers. Connolly and Smith's (1986) results were significant but small, r = 0.40 and 0.11. All other studies (Bee & Dolton, 1985; Johnes & Taylor, 1987; Smith, 1990) found a non-significant relationship between staff-student ratio and the variation between universities in the distribution of degree classes. The other notable finding observed in all these studies is that, the variation in degree awards across universities was consistent over time. Several researchers (Bee & Dolton, 1985; Johnes & Taylor, 1987) have suggested that from the point of view of the student seeking a good degree result it matters little whether differences in awards across universities arise through genuine differences in "value added 11 by the institution or simply through arbitrary institutional perceptions. What does matter is that the differences do exist, that they can be large, and that the pattern is consistent over time. The same can also be said about the impact of a student's, sex or choice of subject studied on their resulting degree award. Bee and Dolton ( 1985) further suggest that "for all concerned a reappraisal of the award system is both necessary and long overdue 11 (p. 49). It is possible that an appraisal of the New Zealand university award system might also be warranted. 45 5. 6 The Present Study - Part A The present study is firstly concerned with whether the grading practices employed at the postgraduate levels of Bachelor with honours and Masters degrees, in New Zealand universities are appropriate and fair. Secondly, whether they are comparable to the universities of England and Wales. The appropriateness of the way in which students are graded is addressed from a theoretical discussion of the past research in this area of interest. The fairness of New Zealand's grading system is researched by statistical analysis of the results awarded to New Zealand postgraduate students over the past thirty years in relation to several other factors, such as gender, course taken, subject studied, and university attended. Whether the systems are comparable is considered with reference to a comparison of the New Zealand results and English and Welsh results generated in the present study, and the results of several previous British studies. The present study is not a replicat ion of any previous research. There are however, similarities between the present study and several other recent studies. The following studies, unless otherwise stated, all involve research using British subjects and/or statistics. The studies of Rudd (1984), Kornbrot (1987), and Clarke (1988), compared degree performance as a function of gender and discipline studied. Research completed by Bee and Dolton, (1985), and Johnes and Taylor, (1987) sought to explain the variation in class of honours as a function of several student and institutional characteristics. Bourner and Bourner ( 1985) and Smith (1990) looked at the equality of standards within specific departments, namely, Accounting and Psychology respectively. Neuman and Ziderman ( 1985) considered whether universities maintained common · standards in awarding first degrees with distinction in Israel. 46 None of these studies have incorporated data that spans three decades, or have covered the population of postgraduate students as comprehensively as the present study. The present study is the first to address the population of New Zealand Bachelor with honours and Masters postgraduate students, as a function of grades received and several other variables discussed below. The present study is exploratory. The first part of the study, Part A, is only concerned with New Zealand students. The variables explored in this part of the research are as follows: 1. Sex of student 2. Course student studied 3. Major studied 4. Year completed degree 5. University attended 6. Class of honours received The major objective of the present study is an analysis of the relationships between the class of honours a student receives and the five other variables. The null hypothesis for this research is that the differences in class of honours awarded is in no way a function of differences between the sexes, between universities, between degrees, between fields of study, or across time. A secondary consideration is any significant relationships between the independent variables. For example, the sample data is measured over the years 1960 to 1989 inclusive. Have their been changes in the proportional representation of male and female students over this time? Are the subjects that were most popular in 1960 the same as those in 1989? 47 The number of students going on to further education in New Zealand has grown in the last two decades, however, in comparison to other similar countries the proportion of students continuing their education is low (Cabinet Committee on Training and Employment, 1987). For example, in 1984 only 24% of 18 to 23 year olds in New Zealand were in some form of part or full time education compared with 49% of the same aged students from North America in 1985, 28% for East Asia, 27% for Latin America, and 32 % for Europe and the United Kingdom (Population Monitoring Group, 1986). Students who chose to attend university in New Zealand are not representative of New Zealand's general population. Social and ethnic origins have a significant effect on the likelihood of a student entering university. For example Maori and Pacific Island students are under-represented at the University of Auckland by a factor of four (Jones, 1982). Women fare better. They now represent close to 50% of the intake of undergraduate students, which displays a degree of equivalence between the sexes, unparalleled by most other Western Countries. However, women are disproportionately represented among the part-time students, mature students, and those studying extramurally (Pool, 1987). Given the unfavourable situation of university education in New Zealand compared to several other countries, comparative research may highlight specific problem areas in the New Zealand university system. The analysis of performance in New Zealand universities is given added meaning by comparing it with the performance of other countries, this is the intention of Part B of the present study. 48 5. 7 The Present Study - Part B The most parsimonious comparison of New Zealand grades with British grades seems best. For this reason the New Zealand university grading system will only be compared with the universities of England and Wales. Scotland, Northern Ireland and Eire, have been omitted because they have different entrance requirements, and different degree and grading structures (UNESCO, 1980; Smith, 1990). Further, previous studies have stated that the differences between the structures of the British university systems have only served to complicate the analysis of results with regards to their investigations of degree performance (for example, Bee & Dolton, 1985; Johnes & Taylor, 1987; Clarke, 1988; Smith, 1990) . In England and Wales, data similar to the variables being considered in Part A of the present research, are collected and collated yearly, and presented as the Universities Statistical Record . This information for the years 1974 to 1989 will be used in Part B of the present study. First a separate analysis will be done to ensure that results of the present study concur with those of past studies that have used this same information. Then these results from England and Wales will be compared with the New Zealand results previously obtained in Part A of the study. The comparisons of results will examine the following variables: 1 . Sex of student 2 . Major studied 3. Class of honours received The objective of Part B of the present study is to determine whether the distribution of grades received by Bachelor with honours and Masters students in New Zealand universities differs to the distribution of grades Bachelor with honours students in England and Wales universities receive. 49 The hypotheses of the present study are listed below. The first two hypotheses apply to the results researched in both Part A and B of the study. The next three hypotheses only address the results of Part A of the present study, the New Zealand results. The last two hypotheses refer to the comparison of results from Part A and Part B of the present study. HYPOTHESES 1 . That male and female students do not receive equivalent proportions of each class of honours. 2. That the grade distribution between areas of study is not equal. 3. That in New Zealand the distribution of grades is different for Bachelor with honours and Masters qualifications. 4. That in New Zealand between the years 1960 to 1989 males and females representation in areas of study has changed. 5. That the proportional distribution of honours grades awarded differs between New Zealand universities. 6. That the areas of study chosen by students in New Zealand and England and Wales universities are dissimilar. 7. That the distribution of grades awarded at New Zealand universities differs to the distribution of grades awarded in England and Wales universities. 6.1 Subjects CHAPTER SIX METHOD - PART A 50 The samp le consisted of all ind ividuals who had completed a Bachelor with honours or Masters degree at any university in New Zealand between the years 1960 to 1989. This complete population of students was chosen above any sampling procedures for several reasons . Firstly , because the statistical analyses used were sensitive to low or zero cell counts (Upton, 1 978) . This would have eventuated if a sampling procedure had been used. Secondly , as research into this field has never been conducted in New Zealand , it was decided to address global issues before proceeding to more specific areas of investigation. For this purpose an extensive sample is therefore advantageous. Finally, this exploratory research may assist in highlighting where further research may be warranted, unaffected by the problem of inaccurate sampling procedures . There was a total of 34413 students, of which 21914 were male, 9601 were female and 2898 were of unknown gender. Gender was unable to be classif ied in some cases as students had first names that were appropriate for either males or females, or they had foreign names which were unable to be correctly determined. After inspection of individual cases the sample was reduced to 31072 students. This represents the total number of subjects for which there was complete and useful information for all variables. Students whose gender was unable to be interpreted, and/or students who had graduated from The University of New Zealand, and/or students with no area of study provided or who had completed a double major were excluded. The sample of 31072 students consisted of 21364 (69.4%) males and 9508 (30.6%) females . 51 6.2 Procedure The information required about each student was extracted from the University Graduation Ceremonies booklet of New Zealand's seven universities: the University of Auckland, University of Waikato, Massey University, Victoria University of Wellington, University of Canterbury, Lincoln College (now Lincoln University), and the University of Otago. As well, the monthly council meetings of Victoria University since 1980 were used as this university has not included graduates "in absentia" in their graduation ceremony handbook since that time. The computer program Massey University Database (Massey University Computer Centre, 1988) was used to record the necessary information for each student in the sample. Information recorded was the student's name, their course of study (COURSE) and their major taken (SUBJECT), and in coded form their gender (SEX) , the university they attended (VARSITY) and in what year (YEAR), also the class of honours they received for the course undertaken was recorded (GRADE). The data was then double checked and corrected for discrepancies. The computer program Word Perfect 5.1 (WordPerfect Corporation, 1989) was used to combine all information into one file, and to code the information on course of study and major taken. The codings used for the variables are listed in Appendix 1 . 6.3 New Zealand Analyses The statistical packages SPSS-PC version 3.1 (SPSS Inc., 1988) and SPSSX version 10 (SPSS Inc., 1986), were used to analyze the data. As the majority of the variables were measured on a nominal scale, analyses were restricted to frequencies, crosstabulations, and chi-square test statistics. The analysis of the data was performed in several steps, in 52 answer to the questions that were being addressed and dependent on the results obtained from previous analyses ; 6.3. 1 Step One - Univariate analysis Univariate information, in the from of frequencies of the variables, were obtained to determine the characteristics of the population that the present research addressed. This was a necessary consideration as the population was not represented in New Zealand annual statistics . 6.3.2 Step Two - Crosstabulation of degree and gender Due to the exploratory nature of the present study, the focus of interest was on global rather than specific differences in the population . For this reason, several of the original variables were reduced to a smaller number of categories. The variable COURSE was collapsed into two categories. They were Bachelor with honours or Masters qualifications. This new variable was labelled DEGREE. The variables DEGREE and SEX were crosstabulated to determine whether males and females undertook both Bachelor with honours and Masters qualifications in the same proportions. 6.3.3 Step Three - Changes in the sample over time Similarly the variable YEAR was collapsed into six separate levels by grouping each consecutive five years together into one value . This variable was called TIME. Previous research overseas has found that the gender composition of persons who attend university now differs from those who attended university in the past (Roemer, 1983; Clarke, 1988). Therefore, step three sought to determine if there had been any changes in the representation of both males and females at New Zealand