Copyright is owned by the Author of the thesis.  Permission is given for 
a copy to be downloaded by an individual for the purpose of research and 
private study only.  The thesis may not be reproduced elsewhere without 
the permission of the Author. 
 
Real-time Vision-based Hand and Face 
Tracking and Recognition of Gesture 
A PhD dissertation submitted in partial fulfillment of the 
requirement for the degree of 
Doctor of Philosophy (Ph.D.) 
in 
Computer Science 
by 
Farhad Dadgostar 
Institute of Information and Mathematical Sciences 
College of Science 
Massey University 
December 2006 
Real-time Vision-Based Hand and Face Tracking and Recognition of Gesture 
Copyright? 2006 
Farhad Dadgostar 
II 
Dedication 
With (ove, to the one who a(ways encouragea me ana yroviaea me inaejinite 
unconditiona( suyyort ... 
'T'o my wife !Jfasim 
Ill 
Acknowledgements 
I would like to thank Dr. Abdolhossein Sarrafzadeh, my supervisor, for all of his 
support, his patience, and his guidance and endless encouragement throughout 
my PhD study. Without his support, I would have never gained such an interest in 
research and nor learned as much as I have. I have been lucky and am proud of 
being a student of him. Working with him was a great pleasure. 
I would like to thank to Dr. Scott P. Overmyer and Dr. Liyanage De Silva for their 
valuable suggestions and comments on my work. 
I also have enjoyed working with a number of fellow PhD colleagues over the past 
years. In particular Andre Barczak and Chao Fan with whom sharing the 
authorship of several papers and reports reflect the closeness of our cooperation. 
The financial support provided by the Massey University and the Institute of 
Information and Mathematical Sciences made my research and presenting the 
results in international conferences possible. In particular, I would like to thank 
Professor Robert McKibbin, the head of the institute for his support during my 
study. 
This achievement would not have been possible without endless support provided 
to me by my family. I would like to thank my parents who have always been 
supportive and encouraging through all the time of my study. I would like to thank 
my precious daughter Kiana for her company and patience when I was working 
on weekends. Finally, I would like to express my deep thanks to my lovely wife, 
Nasim, for all her love, sacrifice, support and encouragement which could be 
found in every word, in this work. 
IV 
List of Publications 
Journal Articles 
? Farhad Dadgostar, and Abdolhossein Sarrafzadeh, "An adaptive real-time skin 
detector based on Hue thresholding: A comparison on two motion tracking 
methods", In Pattern Recognition Letters, Vol. 27, Issue 12, pp. 1342-1352, March 
2006, Elsevier. 
? Abdolhossein Sarrafzadeh, Samuel Alexander, Farhad Dadgostar, Chao Fan, and 
Abbas Bigdeli, "How do you know that I don't understand? A look at the future of 
intelligent tutoring systems", Accepted for publication in Journal of Computers in 
Human Behavior, 2006. 
Book Chapters 
? Farhad Dadgostar, and Abdolhossein Sarrafzadeh, "A Fast Skin Detection 
Algorithm for Video Sequences", In Mohamed Kamel, Aurelio Campilho (Ed.), 
Lecture Notes in Computer Science, ICIAR 2005, Vol. 3656, pp. 804-811, 2005, 
ISBN 3-540-29069-9, Springer-Verlag, Berlin, Heidelberg. 
? Farhad Dadgostar, Abdolhossein Sarrafzadeh, and Scott P. Overmyer, "Face 
Tracking Using Mean-Shift Algorithm: A Fuzzy Approach for Boundary Detection", In 
J. Tao, T. Tan and R. W. Picard (Ed.), Lecture Notes in Computer Science: Affective 
Computing and Intelligent User Interfaces, Vol. 3784, pp. 56-63, 2005, ISBN 3-
540-29621-2, Springer-Verlag, Berlin, Heidelberg. 
? Farhad Dadgostar, Hokyoung Ryu, Abdolhossein Sarrafzadeh, and Scott P. 
Overmyer, "Making sense of student use of nonverbal cues for intelligent tutoring 
systems", In the ACM International Conference Proceeding Series: 19th conference 
of the computer-human interaction special interest group (CHISIG), Vol. 122, pp 1-
4, 2005, ISBN 1-59593-222-4, Canberra, Australia. 
International Refereed Conferences 
? Farhad Dadgostar, Abdolhossein Sarrafzadeh, Chao Fan, Liyanage De Silva, and 
Chris Messom, "Modeling and Recognition of Gesture Signals in 2D Space: A 
comparison of NN and SVM approaches", The 18th IEEE International Conference 
on Tools with Artificial Intelligence, ICTAl 2006, Washington D.C., USA. 
? Farhad Dadgostar, Abdolhossein Sarrafzadeh, Scott P. Overmyer, and Liyanage 
De Silva, "Is the Hand really quicker than the Eye? Variances of the Mean-Shift 
algorithm for real-time hand and face tracking", IEEE International Conference on 
V 
Computational Intelligence for Modelling, Control and Automation, CIMCA 2006, 
Sydney, Australia. 
? Farhad Dadgostar, Abdolhossein Sarrafzadeh, and Scott P. Overmyer, "Genetic 
Algorithms and Long-Haar features: A Method for Object Detection", The 3rd 
International Conference on Cybernetics and Information Technologies, Systems 
and Applications, CITSA 2006, Orlando, Florida, USA. 
? Abdolhossein Sarrafzadeh, Samuel Alexander, Farhad Dadgostar, Chao Fan, and 
Abbas Bigdeli, "See Me, Teach Me: Facial Expression and Gesture Recognition for 
Intelligent Tutoring Systems" , IEEE International Conference on Innovation in 
Information Technology, liT 2006, Dubai, U.A.E. 
? Farhad Dadgostar, Abdolhossein Sarrafzadeh, Hokyoung Ryu, "A Macro model of 
Human Emotional-Response for Intelligent Agents Applications", 1st Korea-New 
Zealand Joint Workshop on Advances of Computational Intelligent Methods and 
Applications, Feb 2006, Auckland, New Zealand. 
? Farhad Dadgostar, Abdolhossein Sarrafzadeh, "A Component-based Architecture 
for Vision-based Gesture Recognition", In the proceedings of the International 
Image and Vision Computing Conference, IVCNZ 2005, pp. 322-328, Dunedin, New 
Zealand. 
Chao Fan, Abdolhossein Sarrafzadeh, Farhad Dadgostar, and Hamid 
Gholamhosseini, "Facial Expression Analysis by Support Vector Regression", In the 
proceedings of the International Image and Vision Computing Conference, IVCNZ 
2005, pp. 311-317, Dunedin, New Zealand. 
? Farhad Dadgostar, Chao Fan, and Abdolhossein Sarrafzadeh, "A Hybrid Approach 
for Robust Real-time Face Tracking in Video Sequences", In the proceedings of the 
Institute of Information and Mathematical Sciences Postgraduate Conference, pp. 
35-42, 2005, Auckland, New Zealand. 
? Chao Fan, Abdolhossein Sarrafzadeh, Farhad Dadgostar, and Hamid 
Gholamhosseini, "Face and Eye Detection Using Support Vector Machines", The 3rd 
International Conference on Computational Intelligence, Robotics and Autonomous 
Systems, CIRAS 2005, Singapore. 
? Farhad Dadgostar, Abdolhossein Sarrafzadeh, and Martin Johnson, "An Adaptive 
Skin Detector for Video Sequences Based on Optical Flow Motion Features" , In the 
proceedings of the 7th International IEEE Conference on Image and Signal 
Processing, lASTED-SIP 2005, In M.W. Marcellin (Ed.), Hawaii, USA. 
? Andre L. C. Barczak, Farhad Dadgostar, and Martin Johnson, "Real-time Hand 
Tracking Using the Viola and Jones Method" , In the proceedings of the 7th 
International IEEE Conference on Image and Signal Processing, lASTED-SIP 2005, 
In M.W. Marcellin (Ed.), Hawaii, USA. 
? Farhad Dadgostar, Abdolhossein Sarrafzadeh, and Scott P. Overmyer, "An 
Adaptive Real-time Skin Detector for Video Sequences", In the Proceedings of the 
2005 International Conference on Computer Vision, Vision 2005, H. R. Arabnia 
(Ed.), CSREA Press 2005, ISBN 1-932415-65-3, pp. 65-70, Las Vegas, Nevada, 
USA. 
? Andre L. C. Barczak, Farhad Dadgostar, and Chris Messom, "Real-time Hand 
Tracking Based on Non-invariant Features", Proceedings of the International IEEE 
Conference on Measurement and Instrumentation, IMTC 2005, Vol. 3, pp 2192-
2192, Ottawa, Canada. 
VI 
? Chao Fan, Farhad Dadgostar, Abdolhossein Sarrafzadeh, Hamid Gholamhosseini, 
and Martin Johnson, "Facial Expression Reconstruction Using Polygon 
Approximation", In the proceedings of the 7th International IEEE Conference on 
Image and Signal Processing, lASTED-SIP 2005, In M.W. Marcellin (Ed.), Hawaii, 
USA. 
? Abdolhossein Sarrafzadeh, Chao Fan, Farhad Dadgostar, Sam Alexander, and 
Chris Messom, "Frown Gives Game Away: Affect Sensitive Tutoring Systems for 
Elementary Mathematics", International IEEE Conference on Systems, Man and 
Cybernetics, SMC 2004, The Hague, The Netherlands. 
Other Publications 
? Farhad Dadgostar, Abdolhossein Sarrafzadeh, Gesture recognition through angle 
space, Research Letters in the Information and Mathematical Sciences, 2006, Vol. 
9, pp 112-119. 
? Farhad Dadgostar, Abdolhossein Sarrafzadeh, A formal model of emotional?
response, inspired from human cognition and emotion systems, Research Letters in 
the Information and Mathematical Sciences, 2006, Vol. 9, pp 89-97. 
? Farhad Dadgostar, Andre L. C. Barczak, Abdolhossein Sarrafzadeh, A Calor Hand 
Gesture Database for Evaluating and Improving Algorithms on Hand Gesture and 
Posture Recognition, Research Letters in the Information and Mathematical 
Sciences, ISSN 1175-2777, 2005, Vol. 7, pp 127-134. 
? And re L. C. Barczak, Farhad Dadgostar, Real-time Hand Tracking Using a Set of 
Cooperative Classifiers and Haar-Like Features, Research Letters in the Information 
and Mathematical Sciences, ISSN 1175-2777, 2005, Vol. 7, pp 29-42. 
VII 
Abstract 
In this dissertation, we present the research pathway to the design and implementation 
of a real-time vision-based gesture recognition system. This system was built based 
on three components, representing three layers of abstraction: i) detection of skin and 
localization of hand and face, ii) tracking multiple skin blobs in video sequences and 
finally iii) recognition of gesture movement trajectories. 
The adaptive skin detection, the first component, was implemented based on our 
novel adaptive skin detection algorithm for video sequences. This algorithm has two 
main sub-components: i) the static skin detector, which is a skin detection method 
based on the hue factor of the skin color, and ii) the adaptive skin detector which 
retrains itself based on new data gathered from movement of the user. The results of 
our experiments show that the algorithm improves the quality of skin detection within 
the video sequences. 
For tracking, a new approach for boundary detection in blob tracking based on the 
Mean-shift algorithm was proposed. Our approach is based on continuous sampling 
of the boundaries of the kernel and changing the size of the kernel using our novel 
Fuzzy-based algorithm. We compared our approach to the kernel density-based 
approach, which is known as the CAM-Shift algorithm, in a set of different noise 
levels and conditions. The results show that the proposed approach is superior in 
stability against white noise, and also provides correct boundary detection for 
arbitrary hand postures, which is not achievable by the CAM-Shift algorithm. 
Finally we presented a novel approach for gesture recognition. This approach 
includes two main parts: i) gesture modeling, and ii) gesture recognition. The gesture 
modeling technique is based on sampling the gradient of the gesture movement 
trajectory and presenting the gesture trajectory as a sequence of numbers . This 
technique has some important features for gesture recognition including robustness 
against slight rotation, a small number of required samples, invariance to the start 
position and device independence. For gesture recognition, we used a multi-layer 
feed-forward neural-network. The results of our experiments show that this approach 
provides 98.7 1 %  accuracy for gesture recognition, and provides a higher accuracy 
rate than other methods introduced in the literature. 
These components form the required framework for vision-based real-time gesture 
recognition and hand and face tracking. The components, individually or as a 
framework, can be applied in scientific and commercial extensions of either vision?
based or hybrid gesture recognition systems. 
VIII 
Table of Contents 
Acknowledgements ......................................................................................................... IV 
List of Publications ........................................................................................................... V 
Abstract ........................................................................................................................ VIII 
Table of Contents ............................................................................................................ IX 
List of Figures .............................................................................................................. XIII 
List of Tables ............................................................................................................... XVII 
List of Acronyms ...................................................................................................... XVIII 
Chapter 1. Introduction .................................................................................................... ! 
1.1 Approaches .. . . .......................... . ....... . . . . . . . . . . ......... . . . . . . . . . . . . . . .......... . . . . . . . . . . . . . . . . . . . . . . . . 2 
1.2 Motivation .... . . . . . . . . . . . . . . . .. . . . ................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... . . . . . . . . . . . ... . . . . . . . . . . . . . 3 
1.3 Thesis overview .... . . ....... .. . . . .. . . . . . ........ . .... . . . . . . . . . . . . . . . . . . . . . ... ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 
1.3.1 Contribution of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . .... . . . . . . . . . ....... . . . . . . . . . . . . . . . . . . . . ...... . . . . .  5 
1.3.2 Thesis outline ..... . . . ...................... . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... .................. . . . . . . . . . . . . . . . . 6 
Chapter 2. Literature review .......................................................................................?.... 8 
2.1 Vision-based gesture recognition .... . . . . . . . . . . ... . . . ... . . ................ . . . . . . . . . . . . . . ... . . . . . . . . . ..... 8 
2.2 Detection .. . . . . . ............. ........................ . . . . . . . . . . . . .. . . . . . . . . . . . . . ..... ... . ... ..... . . . . . . . . . . . . . . . . . . . . . 9 
2.2.1 Shape and contour .. .................................. ....... . . . . ... .......... . . . . . . . . . . . . . . . . . ...... . . . . . 11 
2.2.2 Skin color as a supportive cue ... . . . . . . . . . . . . . . . . . .. . . . . . . ... .......... . . . . . . . . . . . . . . . . . . . . ... . . . . .  15 
2.2.3 Accuracy enhancement - Boosting ... . . . . . . . . . . . . . . . . . . . .. . . . ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 
2.2.4 Appearance, model and texture .... ..... ....................................................... . . .  20 
2.3 Vision-based object tracking ....... . ..... . ... ...... . ...... . ................. ............... . ... . . . . . . . . . 23 
2.3.1 The Kalman filter .. . ................... . ... . . ... . ............ . ......................... . ............... . . . 24 
2.3.2 
2.4 
2.4.1 
2.4.2 
2.5 
Tracking using multiple cues .... . . . . . . . . ... . . . ... . . . . . . . . . . . . . . .............. . . . . . . . . . . . . . . . . . . . . . . 27 
Gesture recognition ... . ................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . ..... . . . . . . . . . . . . . . . . . . . ..... ...... 29 
Hidden Markov Model .... . . . . . . . . . . . . . . . . ......................................... . . . . . . . . . . . ........ 29 
Temporal gesture recognition ... . . . . . . . . . . . . ............... ..... ...... .... . . . . . . .............. . .... 33 
Applications of vision-based gesture recognition systems ........................ . . . . . . .  34 
IX 
2.5.1 
2.5.2 
2.6 
Gesture enabled applications ....................................................................... 34 
Perceptual user interfaces ............................................................................ 36 
Chapter summary .............................................................................................. 38 
Chapter 3. Understanding gesture in daily communications: An empirical study ... 40 
3.1 lntroduction ....................................................................................................... 40 
3.2 Research background ........................................................................................ 41 
3.2.1 Why humans use gestures? An account with common ground theory ....... .44 
3.3 A study on gesture-enabled applications .......................................................... 46 
3.3.1 Experimentation method ............................................................................. .46 
3.3.2 Participants of the experiment .................................................................... .47 
3.4 Results of the experiment ................................................................................ .48 
3.5 Conclusions ....................................................................................................... 51 
Chapter 4. Adaptive skin detection ................................................................................ 53 
4.1 Introduction ....................................................................................................... 53 
4.2 Research background ...... . . . . . . . ..................................... . ............. . . ...................... 53 
4.2.1 Popular col or spaces used for skin col or segmentation ............................... 54 
4.2.2 
4.2.3 
4.2.4 
4.3 
Is there an optimum calor space for skin color detection? .......................... 55 
Skin calor segmentation techniques ....... ................. . ........ . ... . ... . ........ .......... 57 
Adaptive learning of the skin color .............................................................. 66 
The global skin detection algorithm ............................................................. . ... 68 
4.4 The adaptive skin detection algorithm .............................................................. 70 
4.4.1 The algorithm in summary ........................................................................... 72 
4.5 Motion detection for adaptive skin detection .................................................... 73 
4.5 .1 Underlying assumptions .............................................................................. 73 
4.5.2 Frame subtraction motion tracking .......... .................................................... 74 
4.6 The experiment .... . . . . .... ....... . . . ..... . . . . . . . . ................... ................ ... . . .................. .... 75 
4.6.1 Specifying the thresholds of the Global Skin Detector ................................ 75 
4.6.2 Ground-truth data ......................................................................................... 76 
4.6.3 Measured parameters ................................................................................... 78 
4.6.4 Choosing the merging factor ....................................................................... 79 
4.6.5 Results ......................................................................................................... 80 
4.6.6 Sparseness of the correctly and falsely detected pixels ............................... 82 
X 
4.6.7 The comparison of the behavior of the algorithm using optical-flow motion 
tracking ....................... . . . . . . ..................... . . . .............. . . . . . . . . . . . . . . .. . ... . . ......................... ... 82 
4. 7 Chapter summary ... ... . .... . .......................................... . . . . . . . ............ . ... . . . . . ......... ... 86 
Chapter 5. A novel approach for robust tracking, based on the Mean-shift algorithm 
........................................................................................................................................... 93 
5.1 Introduction ............ ... . ......................... ...................... . ... . . ................ .................. 93 
5.1.1 The Mean-shift algorithm ................................. .... . ................................. ... .. 94 
5.1.2 Automatic resizing of the kernel.. ............ .......... .. . ... . ... ... ............. . .......... ..... 95 
5.2 Research Background .................... .............. ................. . ... ........ . ....... . ............... 98 
5 .2.1 The Mean-shift algorithm and feature tracking ........ ... ... ..... . ....................... 99 
5.2.2 The Mean-shift algorithm and variable sized kernel ........... ...................... l01 
5.3 Fuzzy-based kernel resizing ...... ... ...... ................ ....................... . ....... . ............. l04 
5.3.1 Initialization and boundary detection of the kernel ..... .......................... .... 104 
5.3.2 Fuzzy boundary detector ........ . .............. ............... . . . ..... ........ .... . . .. . ....... . ..... 107 
5.4 Experiments and Results ............. . ... ........... ... ............ . . ..... . .............................. 108 
5 .4.1 Experiment 1: Detecting a blob with no movement ............... ............ ....... 110 
5.4.2 Experiment 2: Tracking the blob of a moving hand ..... ..... . ........ .......... ..... 112 
5.4.3 Experiment 3: Tracking an object moving away from the camera .......... .. 113 
5.4.4 Experiment 4: Tracking a moving hand in occluded situation ............. ..... 114 
5.4.5 Accuracy of tracking .. ........... ........... .................. . . . . . ... . ....... . ....... . . ........... . .. 115 
5.5 Implementation issues for hand and face tracking ... . . ..... . ..... . ......... . . ............ . .  117 
5.5.1 The multi-tracker implementation ................... .... . . ..... ...... . ................ . . . ..... 118 
5.5.2 Tracker-tracker implementation ....................... . . .............. . ........................ 121 
5.5.3 Using depth information for hand and face tracking ......... . ... . . . . . ........ . . . . . . . 122 
5.6 Chapter summary .... ........... . . . . . . . .................. ......... . . . . . . . ............ . . .... . .......... . . ..... 127 
Chapter 6. Modeling and recognition of gesture signals in 2D space ....................... 128 
6.1 Introduction ......... . . .... . . . . ........ ..................... . . . . ............ ... . . .............. . . . .... . . . . . . . . ... 128 
6.2 Research background ........ . . . . . . . . ................... ... ... . ... . . . .. . . . . . .......... . . . .......... ...... .. 130 
6.2.1 Time series analysis ................................. ............ . . . ......... . . . . . . . . . . ..... . . . ... . . . . . 130 
6.2.2 Gesture identification through pattern recognition .. . ............................. .... 132 
6.3 The gesture trajectory recognition technique ........ ...... . . . ................ ................. 135 
6.3.1 Feature selection .. . . ........... . . ............... .................... . .... . . . . ....... . . .................. 135 
6.3.2 Gesture Classification .... . . . . . .............. .. ....... .......... . . .......................... . . . . . ..... 138 
XI 
6.3.3 Recording the training data .......... . . . ................ . . . .. . . ......... . . . . . . . . . . . . . ... . . ......... 139 
6.3.4 Normalizing the data ... . . . . . . . . ........... .................. .......................................... 140 
6.4 The experiments ............. ........................................ ............... ............. . . . .. . ....... 140 
6.4.1 The first experiment - Using one ANN ........... ........... . , ................... . . ........ 140 
6.4.2 The second experiment ................................. ..... . ............ ........................... 142 
6.4.3 Comparing the ANN classifier with a SVM classifiers ........................... .. 147 
6.4.4 Gesture classification using one ANN with 14 outputs .... .................. ... . ... 149 
6.5 Implementation of the gesture tracking system . .............. .... . .. . ........ . . . .. ... . . . . . . . 152 
6.5.1 System inputs ...... . . .. . . . ..... . . ... ............................. . ........... . .......... . . . . . . . . . . . . . . . . .. 152 
6.5.2 Gesture movement trajectory recognition ............ . ..................................... 153 
6.6 Chapter summary ...................................... .......... ............................................ 156 
Chapter 7. Summary, conclusion and future research .............................................. 158 
7.1 Extending the work .. ....... . ..... ................ .......................................................... 161 
7.2 The future of gesture recognition systems .......... ...... ...................................... 162 
References ...................................................................................................................... 163 
XII 
List of Figures 
Figure 2.1. a) The first set of features for: a) 0? rotation, b) -15? rotation, and c) -84 o 
rotation ...................................................................................................................... 19 
Figure 3.1. (a) A metaphoric gesture: indicating an imaginary point accompanying 
speech, (b) A metaphoric gesture: waving hands accompanying speech, (c) A 
deictic gesture: counting, and (d) A deictic gesture: pointing . ................................. 50 
Figure 3.2. Distribution of gestures over development stages .......................................... 51 
Figure 4.1. In some conditions, the probability density of the skin color of the user is 
lower than some of the unwanted regions like wood color. ..................................... 62 
Figure 4.2. (a, c) Original image, (b, d) Filtered image using Hue thresholding . ............. 68 
Figure 4.3. Overview of the global skin detector .............................................................. 69 
Figure 4.4. (a, c, e) Original Images, (b, d, f) Filtered image based on thresholding Hue 
factor of skin col or extracted from training data (Hue factor of the col or of the some 
background objects are similar to skin color) . .......................................................... 69 
Figure 4.5. Overview of the adaptive skin detector ........................................................... 72 
Figure 4.6. Changes in the lower threshold and upper threshold between frames 0 and 
2791 . ......................................................................................................................... 72 
Figure 4.7. Testing environment, (a) Camera view, (b) Configuration settings ................ 73 
Figure 4.8. A moving hand: (a) Original image, (b) In-motion pixels of the frame, filtered 
using Hue thresholding (c) Mapping the result to the original image . ..................... 75 
Figure 4.9. Samples of the input images and their ground-truth data ................................ 77 
Figure 4.1 0. The accuracy of the algorithm based on merging factor ............................... 80 
Figure 4.11. The changes in the output over the time . .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 
Figure 4.12. Track points based on traditional approach (good features to track) . ........... 84 
Figure 4.13. Initializing track points, based on primary skin color filter and equal 
distances ..................................... ............................................................................... 84 
Figure 4.14. Tracking skin color segments using LK optical -flow motion tracking and 
selected features to track ........................................................................................... 85 
Figure 4.15. Nodding head: (a) Original image, (b) In-motion pixels of the frame, filtered 
using Global Skin Detector (c) Mapping the result to the original image . ............... 85 
XIII 
Figure 4.16. A short video sequence presenting the behavior of the adaptive skin 
detection algorithm over time. On the top-right corner of each image, the detected 
silhouette of the skin is presented. The falsely detected areas were gradually 
eliminated over time . . . . . ....... . . . .............. . . . . . . ... . . ......... . . ...................... . . . . . . . . ....... . . . ... . .  87 
Figure 4.17. The behavior of the adaptive skin detector using frame subtraction motion 
detection ....................... ........ ............... ...... . . .............. . . .............. ........... . . . . . . . . .... . . . .... . .  88 
Figure 4.18. The results on using morphological operators on dataset 1. ...... . ..... ...... . . ..... 89 
Figure 4.19. The results on using morphological operators on dataset 2 . ....... ............... . . .  90 
Figure 4.20. The results on using morphological operators on dataset 3 . ......................... 91 
Figure 4.21. Comparison of measured parameters of the algorithm, using different motion 
tracking methods (PS: Frame Subtraction, OF: Optical Flow). The values are the 
average of the values measured for all three datasets. However, calculating the 
average for a small number of datasets (3 in this case) may bias the results toward 
one of the datasets (e.g. data set 2 dominates the results shown in graph d) . ............ 92 
Figure 5 .1. Iterations, to find the highest dense kernel using the Mean-shift algorithm. 
The initial position of the kernel is [1] which has a small overlap with the face blob. 
2, 3, 4, 5, are the new positions of the kernel to match the center of gravity of the 
kernel to the geometrical center of the kernel... .................. ................................... . . .  95 
Figure 5.2. Choosing the size and initial placement of the kernel. (a) incorrect placement 
of the kernel, (b) choosing a kernel that is too large, (c) choosing a too small kernel. 
................ . ........................................................................................................... . . .... 96 
Figure 5.3. Inhomogeneous detection of the skin pixels in the image. The density of the 
pixels are varying in the face area which makes the detection and tracking of the 
face blob more difficult. .. ....................................................................................... 105 
Figure 5.4. Fuzzy values for the inputs of the fuzzy boundary detector ......................... 107 
Figure 5.5. Fuzzy outputs for the fuzzy boundary detector. .................... . ....................... 108 
Figure 5.6. Detected boundary by the two algorithms, surrounding a static image 
silhouette ... . . . . ... . . ............. . . . . . . . . . ........ .................... ...... ... ....... ......... ....... ................... 110 
Figure 5.7. a) The correct detection determined by Edge density-fuzzy. The smaller 
rectangle is the result of kernel density-based- sqrt(mOO)- method. b) Behavior of 
the algorithms with a noise level of 15% . . .... ... . ............. ......................................... 112 
Figure 5.8. Tracking the hand in a "grabbing" hand gesture with white noise 20%, a) 
Original image sequence, b) Centre of gravity, c) Error of displacement in 
comparison to an ideal tracker (Xc-Xc idea1) . ...... . . ................. ........................ ........... 113 
Figure 5.9. Tracking the centre of gravity- zoom out with 25% white noise, a) Original 
image sequence, b) Xc centre of gravity, c) Area of the kernel, and d) Error of 
placement of the kernel in comparison to an ideal tracker (X-Xictea1) ?????????????????????? 114 
XIV 
Figure 5.10. Moving hand, in occluded situation, a) the original image sequence, b) Xc of 
the centre of gravity in noise 20%, d) Error in Xc of the kernel in comparison to an 
ideal tracker (X-XicteaJ) ............................................................................................. 115 
Figure 5.11. The average Mean-distance between the trackers and the ideal tracker in 
noise level 0% to 30%. a) "Grabbing hand gesture dataset" (experiment 2) , b) "Face 
zoom out" dataset (experiment 3) , and c) "Occluded hands" dataset (experiment 4) . 
........................... ......................................................................... . ........................... 117 
Figure 5.12. Face tracking using the proposed algorithm for skin detection, and the Mean-
shift algorithm for blob tracking . ............................................................................ 119 
Figure 5.13. Running multiple trackers on the image sequence of hand movement based 
on the algorithm described in Section 5.5.1.. .......................................................... 120 
Figure 5.14. Robust hand and face tracking using the tracker-tracker algorithm on an 
image sequence of different bi-hand movements . .................................................. 122 
Figure 5.15. a) A sample image, b) the depth information of the sample image. The light 
area is the object closer to the camera, the black patches are of unknown depth . .. 124 
Figure 5.16. Background elimination using the depth thresholding technique . .............. 125 
Figure 5.17. Some of the scenarios in which depth information can be useful for 
occlusion resolution . ............................................................................ .............. ..... 126 
Figure 5.18. Occlusion resolution using the depth information and the Mean-shift 
algorithm. The tracked blobs are displayed on the disparity image (right) . ........... 127 
Figure 6.1. a) Quantized input vectors, b) Gesture vector (0, 13, 0), c) Gesture vector (17, 
31, 17) ..................................................................................................................... 136 
Figure 6.2. a) Original gesture trajectory, b) sampling from the gesture trajectory, c) 
gesture signal of the coiiected data over time, d) reconstructed gesture using [c] .. 137 
Figure 6.3. Left: Gesture signal, Right: gesture movement trajectories ............... ....... 138 
Figure 6.4. Gesture recognition based on input gesture signal to a classifier. ................ 138 
Figure 6.5. The gesture signal recording software . ......................................................... 139 
Figure 6.6. The structure of the ANN for gesture classification ..................................... 141 
Figure 6.7. Performance of the ANN over time while training (Goal was 0) . ................ 142 
Figure 6.8. Continues gesture detection using one ANN ................................................ 142 
Figure 6.9. The histogram of the output values of the ANNs for the test data ................ 146 
Figure 6.10. The effect of the change of the threshold on average correct detection ratio 
(middle vertical axis) and false detection ratio (right vertical axis) . ...................... 146 
Figure 6.11. The evaluation of a support vector machine in classifying the 13 gesture 
signals in comparison to the ANN classifiers . ........................................................ 148 
Figure 6.12. The structure of ANN for detecting multiple gesture signals . .................... 150 
XV 
Figure 6.13. Comparison of correct positive and false positive recognition accuracy of a 
single ANN in gesture signal classification employing different numbers of hidden 
layers .................................................................................................... .. ................. 151 
Figure 6.14. Recording the hand movement trajectory using the Fuzzy-based Mean-shift 
tracker. The gesture movement trajectory signal is presented on the right-top corner 
of each frame . . ... . . . . . . . ............ .............. ................................ .......... ... ... .......... .. ........ 153 
Figure 6.15. Implementation of the vision-based gesture recognition using one ANN and 
the Fuzzy-based Mean-shift tracker. The diagram on the right-bottom corner of the 
image represents the detected gestures over time . .............. ...... . .... ... . ..................... 155 
Figure 6.16. Recognition of gesture signals number 13 and 9 using the vision-based 
gesture recognition system . ...... ......................................... .. .. ........ . . ..... ... ............ . .. 155 
XVI 
List of Tables 
Table 3.1. Different stages of mathematical skills development based on the Numeracy 
project ....................................................................................................................... 48 
Table 3.2. Gesture use in the experiment. ........................................................................ .48 
Table 3.3. Hand or body gesture use over developmental stage ...................................... .49 
Table 4.1. An overview on calor spaces and their applications in skin detection ............. 55 
Table 4.2. Comparative evaluation of different skin detectors .......................................... 58 
Table 4.3. The characteristics of the image datasets used for the experiments ................. 76 
Table 5 .1. Fuzzy controller for the fuzzy boundary detector .......................................... 1 07 
Table 5.2. Comparison of accuracy of the algorithms ..................... ................. . . ............. 116 
Table 6.1. First experiment, detecting multiple gestures using a single ANN ................ 141 
Table 6.2. The data recorded for the second experiment, using the gesture recorder and a 
digital tablet. ........................................................................................................... 144 
Table 6.3. The results concluded from the experiment 2 (one ANN for each gesture 
signal) ..................................................................................................................... 145 
Table 6.4. Analyzing the accuracy/inaccuracy of multiple ANNs for gesture classification 
using confusion matrix (all values are in percent) . ................................................. 147 
Table 6.5. The evaluation of a support vector machine in classifying the 13 gesture 
signals . ............................................................. . . ... ....... . . ......................................... 148 
Table 6.6. The evaluation of a single ANN in classifying 14 gesture classes using 2 
hidden layers . .......................................................................................................... 150 
Table 6. 7. Comparison of the accuracy of a single ANN containing 0 to 7 hidden layers in 
recognizing the 14 gesture signal classes . .............................................................. 151 
Table 6.8. Comparison the accuracy of the proposed method with other gesture 
recognition systems in the literature . ...................................................................... 157 
XVII 
List of Acronyms 
ANN 
AP 
ASD 
ASL 
BBN 
CAM-Shift 
CCD 
CIELAB,CIEXYZ 
CP 
CPU 
CRC N N  
DOG 
DTW 
GHz 
GMM 
GSD 
HCE 
HCI 
H I  
HIS 
HMM 
HS 
HSV 
ICrCb 
ICT 
IUV 
LCS 
MSEPF 
NN 
PC 
Artificial Neural Network 
Attention Point 
Adaptive Skin Detector 
American Sign Language 
Bayesian Belief Network 
Continuously-Adaptive Mean-Shift 
Charged Coupled Device 
Both calor spaces, refer to perceptually linear calor spaces known as C I E-L-A-B and CIE-X-Y-Z 
Calor Predicate 
Central processing Unit 
Hyper Rectangular Composite Neural Network 
Difference of Gaussian 
Dynamic Time Warping 
Giga Hertz 
Gaussian Mixture Model 
Global Skin Detector 
Histogram Error 
Human-Computer I nteraction 
Histogram Intersection 
H ue-Satu ration-Intensity (Calor model) 
Hidden Markov Model 
Hue-Saturation (calor model) 
Hue-Saturation-Value (Calor model) 
lntensity-Chrominance red-Chromina nce blue (Color model) 
I nformation-Communication Technology 
l ntensity-Luminance-Chrom inance (Calor model) 
Localized Contour Seq uence 
Mean-Shift Embedded Particle Filter 
Neural Network 
Personal Computer 
XVIII 
PCA 
PDA 
PDF 
PUI 
RBF 
RGB 
SASOM 
SMA 
SOM 
STFT 
SVM 
tMH I 
TMS 
UAV 
Ul 
UR 
USB 
YCrCb 
YIQ 
YUV 
Principal Component Analysis 
Personal Digital Assistant 
Probability Density Function 
Perceptual User I nterface 
Radial Basis Function 
Red-Green-Blue (Calor model) 
Structure Adaptive Self-Organized Map 
Specialized Mapping Arch itecture 
Self-Organized Map 
Short-Time Fourier Transform 
Support Vector Machine 
Time-Motion H istory Image 
Transcranial Magnetic Stimulation 
Un manned Aerial Vehicle 
User Interface 
Unreliability Rate 
Universal Serial Bus 
YCrCb is an encoded nonlinear RGB signal, commonly used by European television studios and for 
image compression. 
Calor model inspired from human vision system, formerly used in NTSC television b roadcasting 
Luminance-Chrominance (Calor model, mainly is used in PAL analog television broadcasting) 
XIX