A spoken Chinese corpus : development, description, and application in L2 studies : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Applied Linguistics at Massey University, Manawatū, New Zealand
This thesis introduces a corpus of present-day spoken Chinese, which contains over 440,000 words of orthographically transcribed interactions. The corpus is made up of an L1 corpus and an L2 corpus. It includes data gathered in informal contexts in 2018, and is, to date, the first Chinese corpus resource of its kind investigating non-test/task-oriented dialogical interaction of L2 Chinese. The main part of the thesis is devoted to a detailed account of the compilation of the spoken Chinese corpus, including its design, the data collection, and transcription. In doing this, this study attempts to answer the question: what are the key considerations in building a spoken Chinese corpus of informal interaction, especially in building a spoken L2 corpus of L1–L2 interaction? Then, this thesis compares the L1 corpus and the L2 corpus before using them to carry out corpus studies. Differences between and within the two subcorpora are discussed in some detail. This corpus comparison is essential to any L1–L2 comparative studies conducted on the basis of the spoken Chinese corpus, and it addresses the question: to what extent is the L1 corpus comparable to the L2 corpus? Finally, this thesis demonstrates the research potential of the spoken Chinese corpus, by presenting an analysis of the L2 use of the discourse marker 就是 jiushi in comparison with the L1 use. Analysis considers mainly the contribution就是 jiushi makes as a reformulation marker to utterance interpretation within the relevance theoretic framework. To do this, it seeks to answer the question: what are the features that characterise the L2 use of the marker 就是 jiushi in informal speech?
The results of this study make several useful contributions to the academic community. First of all, the spoken Chinese corpus is available to the academic community through the website, so it is expected the corpus itself will be of use to researchers, Chinese teachers, and students who are interested in spoken Chinese. In addition to the obtainable data, this thesis presents transparent accounts of each step of the compilation of both the L1 and L2 corpora. As a result, decisions and strategies taken with regard to the procedures of spoken corpus design and construction can provide some valuable suggestions to researchers who want to build their own spoken Chinese corpora. Finally, the findings of the comparative analysis of the L2 use of the marker 就是 jiushi will contribute to research on the teaching and learning of interactive spoken Chinese.