Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere without the permission of the Author. Real-time Vision-based Hand and Face Tracking and Recognition of Gesture A PhD dissertation submitted in partial fulfillment of the requirement for the degree of Doctor of Philosophy (Ph.D.) in Computer Science by Farhad Dadgostar Institute of Information and Mathematical Sciences College of Science Massey University December 2006 Real-time Vision-Based Hand and Face Tracking and Recognition of Gesture Copyright? 2006 Farhad Dadgostar II Dedication With (ove, to the one who a(ways encouragea me ana yroviaea me inaejinite unconditiona( suyyort ... 'T'o my wife !Jfasim Ill Acknowledgements I would like to thank Dr. Abdolhossein Sarrafzadeh, my supervisor, for all of his support, his patience, and his guidance and endless encouragement throughout my PhD study. Without his support, I would have never gained such an interest in research and nor learned as much as I have. I have been lucky and am proud of being a student of him. Working with him was a great pleasure. I would like to thank to Dr. Scott P. Overmyer and Dr. Liyanage De Silva for their valuable suggestions and comments on my work. I also have enjoyed working with a number of fellow PhD colleagues over the past years. In particular Andre Barczak and Chao Fan with whom sharing the authorship of several papers and reports reflect the closeness of our cooperation. The financial support provided by the Massey University and the Institute of Information and Mathematical Sciences made my research and presenting the results in international conferences possible. In particular, I would like to thank Professor Robert McKibbin, the head of the institute for his support during my study. This achievement would not have been possible without endless support provided to me by my family. I would like to thank my parents who have always been supportive and encouraging through all the time of my study. I would like to thank my precious daughter Kiana for her company and patience when I was working on weekends. Finally, I would like to express my deep thanks to my lovely wife, Nasim, for all her love, sacrifice, support and encouragement which could be found in every word, in this work. IV List of Publications Journal Articles ? Farhad Dadgostar, and Abdolhossein Sarrafzadeh, "An adaptive real-time skin detector based on Hue thresholding: A comparison on two motion tracking methods", In Pattern Recognition Letters, Vol. 27, Issue 12, pp. 1342-1352, March 2006, Elsevier. ? Abdolhossein Sarrafzadeh, Samuel Alexander, Farhad Dadgostar, Chao Fan, and Abbas Bigdeli, "How do you know that I don't understand? A look at the future of intelligent tutoring systems", Accepted for publication in Journal of Computers in Human Behavior, 2006. Book Chapters ? Farhad Dadgostar, and Abdolhossein Sarrafzadeh, "A Fast Skin Detection Algorithm for Video Sequences", In Mohamed Kamel, Aurelio Campilho (Ed.), Lecture Notes in Computer Science, ICIAR 2005, Vol. 3656, pp. 804-811, 2005, ISBN 3-540-29069-9, Springer-Verlag, Berlin, Heidelberg. ? Farhad Dadgostar, Abdolhossein Sarrafzadeh, and Scott P. Overmyer, "Face Tracking Using Mean-Shift Algorithm: A Fuzzy Approach for Boundary Detection", In J. Tao, T. Tan and R. W. Picard (Ed.), Lecture Notes in Computer Science: Affective Computing and Intelligent User Interfaces, Vol. 3784, pp. 56-63, 2005, ISBN 3- 540-29621-2, Springer-Verlag, Berlin, Heidelberg. ? Farhad Dadgostar, Hokyoung Ryu, Abdolhossein Sarrafzadeh, and Scott P. Overmyer, "Making sense of student use of nonverbal cues for intelligent tutoring systems", In the ACM International Conference Proceeding Series: 19th conference of the computer-human interaction special interest group (CHISIG), Vol. 122, pp 1- 4, 2005, ISBN 1-59593-222-4, Canberra, Australia. International Refereed Conferences ? Farhad Dadgostar, Abdolhossein Sarrafzadeh, Chao Fan, Liyanage De Silva, and Chris Messom, "Modeling and Recognition of Gesture Signals in 2D Space: A comparison of NN and SVM approaches", The 18th IEEE International Conference on Tools with Artificial Intelligence, ICTAl 2006, Washington D.C., USA. ? Farhad Dadgostar, Abdolhossein Sarrafzadeh, Scott P. Overmyer, and Liyanage De Silva, "Is the Hand really quicker than the Eye? Variances of the Mean-Shift algorithm for real-time hand and face tracking", IEEE International Conference on V Computational Intelligence for Modelling, Control and Automation, CIMCA 2006, Sydney, Australia. ? Farhad Dadgostar, Abdolhossein Sarrafzadeh, and Scott P. Overmyer, "Genetic Algorithms and Long-Haar features: A Method for Object Detection", The 3rd International Conference on Cybernetics and Information Technologies, Systems and Applications, CITSA 2006, Orlando, Florida, USA. ? Abdolhossein Sarrafzadeh, Samuel Alexander, Farhad Dadgostar, Chao Fan, and Abbas Bigdeli, "See Me, Teach Me: Facial Expression and Gesture Recognition for Intelligent Tutoring Systems" , IEEE International Conference on Innovation in Information Technology, liT 2006, Dubai, U.A.E. ? Farhad Dadgostar, Abdolhossein Sarrafzadeh, Hokyoung Ryu, "A Macro model of Human Emotional-Response for Intelligent Agents Applications", 1st Korea-New Zealand Joint Workshop on Advances of Computational Intelligent Methods and Applications, Feb 2006, Auckland, New Zealand. ? Farhad Dadgostar, Abdolhossein Sarrafzadeh, "A Component-based Architecture for Vision-based Gesture Recognition", In the proceedings of the International Image and Vision Computing Conference, IVCNZ 2005, pp. 322-328, Dunedin, New Zealand. Chao Fan, Abdolhossein Sarrafzadeh, Farhad Dadgostar, and Hamid Gholamhosseini, "Facial Expression Analysis by Support Vector Regression", In the proceedings of the International Image and Vision Computing Conference, IVCNZ 2005, pp. 311-317, Dunedin, New Zealand. ? Farhad Dadgostar, Chao Fan, and Abdolhossein Sarrafzadeh, "A Hybrid Approach for Robust Real-time Face Tracking in Video Sequences", In the proceedings of the Institute of Information and Mathematical Sciences Postgraduate Conference, pp. 35-42, 2005, Auckland, New Zealand. ? Chao Fan, Abdolhossein Sarrafzadeh, Farhad Dadgostar, and Hamid Gholamhosseini, "Face and Eye Detection Using Support Vector Machines", The 3rd International Conference on Computational Intelligence, Robotics and Autonomous Systems, CIRAS 2005, Singapore. ? Farhad Dadgostar, Abdolhossein Sarrafzadeh, and Martin Johnson, "An Adaptive Skin Detector for Video Sequences Based on Optical Flow Motion Features" , In the proceedings of the 7th International IEEE Conference on Image and Signal Processing, lASTED-SIP 2005, In M.W. Marcellin (Ed.), Hawaii, USA. ? Andre L. C. Barczak, Farhad Dadgostar, and Martin Johnson, "Real-time Hand Tracking Using the Viola and Jones Method" , In the proceedings of the 7th International IEEE Conference on Image and Signal Processing, lASTED-SIP 2005, In M.W. Marcellin (Ed.), Hawaii, USA. ? Farhad Dadgostar, Abdolhossein Sarrafzadeh, and Scott P. Overmyer, "An Adaptive Real-time Skin Detector for Video Sequences", In the Proceedings of the 2005 International Conference on Computer Vision, Vision 2005, H. R. Arabnia (Ed.), CSREA Press 2005, ISBN 1-932415-65-3, pp. 65-70, Las Vegas, Nevada, USA. ? Andre L. C. Barczak, Farhad Dadgostar, and Chris Messom, "Real-time Hand Tracking Based on Non-invariant Features", Proceedings of the International IEEE Conference on Measurement and Instrumentation, IMTC 2005, Vol. 3, pp 2192- 2192, Ottawa, Canada. VI ? Chao Fan, Farhad Dadgostar, Abdolhossein Sarrafzadeh, Hamid Gholamhosseini, and Martin Johnson, "Facial Expression Reconstruction Using Polygon Approximation", In the proceedings of the 7th International IEEE Conference on Image and Signal Processing, lASTED-SIP 2005, In M.W. Marcellin (Ed.), Hawaii, USA. ? Abdolhossein Sarrafzadeh, Chao Fan, Farhad Dadgostar, Sam Alexander, and Chris Messom, "Frown Gives Game Away: Affect Sensitive Tutoring Systems for Elementary Mathematics", International IEEE Conference on Systems, Man and Cybernetics, SMC 2004, The Hague, The Netherlands. Other Publications ? Farhad Dadgostar, Abdolhossein Sarrafzadeh, Gesture recognition through angle space, Research Letters in the Information and Mathematical Sciences, 2006, Vol. 9, pp 112-119. ? Farhad Dadgostar, Abdolhossein Sarrafzadeh, A formal model of emotional? response, inspired from human cognition and emotion systems, Research Letters in the Information and Mathematical Sciences, 2006, Vol. 9, pp 89-97. ? Farhad Dadgostar, Andre L. C. Barczak, Abdolhossein Sarrafzadeh, A Calor Hand Gesture Database for Evaluating and Improving Algorithms on Hand Gesture and Posture Recognition, Research Letters in the Information and Mathematical Sciences, ISSN 1175-2777, 2005, Vol. 7, pp 127-134. ? And re L. C. Barczak, Farhad Dadgostar, Real-time Hand Tracking Using a Set of Cooperative Classifiers and Haar-Like Features, Research Letters in the Information and Mathematical Sciences, ISSN 1175-2777, 2005, Vol. 7, pp 29-42. VII Abstract In this dissertation, we present the research pathway to the design and implementation of a real-time vision-based gesture recognition system. This system was built based on three components, representing three layers of abstraction: i) detection of skin and localization of hand and face, ii) tracking multiple skin blobs in video sequences and finally iii) recognition of gesture movement trajectories. The adaptive skin detection, the first component, was implemented based on our novel adaptive skin detection algorithm for video sequences. This algorithm has two main sub-components: i) the static skin detector, which is a skin detection method based on the hue factor of the skin color, and ii) the adaptive skin detector which retrains itself based on new data gathered from movement of the user. The results of our experiments show that the algorithm improves the quality of skin detection within the video sequences. For tracking, a new approach for boundary detection in blob tracking based on the Mean-shift algorithm was proposed. Our approach is based on continuous sampling of the boundaries of the kernel and changing the size of the kernel using our novel Fuzzy-based algorithm. We compared our approach to the kernel density-based approach, which is known as the CAM-Shift algorithm, in a set of different noise levels and conditions. The results show that the proposed approach is superior in stability against white noise, and also provides correct boundary detection for arbitrary hand postures, which is not achievable by the CAM-Shift algorithm. Finally we presented a novel approach for gesture recognition. This approach includes two main parts: i) gesture modeling, and ii) gesture recognition. The gesture modeling technique is based on sampling the gradient of the gesture movement trajectory and presenting the gesture trajectory as a sequence of numbers . This technique has some important features for gesture recognition including robustness against slight rotation, a small number of required samples, invariance to the start position and device independence. For gesture recognition, we used a multi-layer feed-forward neural-network. The results of our experiments show that this approach provides 98.7 1 % accuracy for gesture recognition, and provides a higher accuracy rate than other methods introduced in the literature. These components form the required framework for vision-based real-time gesture recognition and hand and face tracking. The components, individually or as a framework, can be applied in scientific and commercial extensions of either vision? based or hybrid gesture recognition systems. VIII Table of Contents Acknowledgements ......................................................................................................... IV List of Publications ........................................................................................................... V Abstract ........................................................................................................................ VIII Table of Contents ............................................................................................................ IX List of Figures .............................................................................................................. XIII List of Tables ............................................................................................................... XVII List of Acronyms ...................................................................................................... XVIII Chapter 1. Introduction .................................................................................................... ! 1.1 Approaches .. . . .......................... . ....... . . . . . . . . . . ......... . . . . . . . . . . . . . . .......... . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Motivation .... . . . . . . . . . . . . . . . .. . . . ................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... . . . . . . . . . . . ... . . . . . . . . . . . . . 3 1.3 Thesis overview .... . . ....... .. . . . .. . . . . . ........ . .... . . . . . . . . . . . . . . . . . . . . . ... ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Contribution of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . .... . . . . . . . . . ....... . . . . . . . . . . . . . . . . . . . . ...... . . . . . 5 1.3.2 Thesis outline ..... . . . ...................... . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... .................. . . . . . . . . . . . . . . . . 6 Chapter 2. Literature review .......................................................................................?.... 8 2.1 Vision-based gesture recognition .... . . . . . . . . . . ... . . . ... . . ................ . . . . . . . . . . . . . . ... . . . . . . . . . ..... 8 2.2 Detection .. . . . . . ............. ........................ . . . . . . . . . . . . .. . . . . . . . . . . . . . ..... ... . ... ..... . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Shape and contour .. .................................. ....... . . . . ... .......... . . . . . . . . . . . . . . . . . ...... . . . . . 11 2.2.2 Skin color as a supportive cue ... . . . . . . . . . . . . . . . . . .. . . . . . . ... .......... . . . . . . . . . . . . . . . . . . . . ... . . . . . 15 2.2.3 Accuracy enhancement - Boosting ... . . . . . . . . . . . . . . . . . . . .. . . . ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.4 Appearance, model and texture .... ..... ....................................................... . . . 20 2.3 Vision-based object tracking ....... . ..... . ... ...... . ...... . ................. ............... . ... . . . . . . . . . 23 2.3.1 The Kalman filter .. . ................... . ... . . ... . ............ . ......................... . ............... . . . 24 2.3.2 2.4 2.4.1 2.4.2 2.5 Tracking using multiple cues .... . . . . . . . . ... . . . ... . . . . . . . . . . . . . . .............. . . . . . . . . . . . . . . . . . . . . . . 27 Gesture recognition ... . ................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . ..... . . . . . . . . . . . . . . . . . . . ..... ...... 29 Hidden Markov Model .... . . . . . . . . . . . . . . . . ......................................... . . . . . . . . . . . ........ 29 Temporal gesture recognition ... . . . . . . . . . . . . ............... ..... ...... .... . . . . . . .............. . .... 33 Applications of vision-based gesture recognition systems ........................ . . . . . . . 34 IX 2.5.1 2.5.2 2.6 Gesture enabled applications ....................................................................... 34 Perceptual user interfaces ............................................................................ 36 Chapter summary .............................................................................................. 38 Chapter 3. Understanding gesture in daily communications: An empirical study ... 40 3.1 lntroduction ....................................................................................................... 40 3.2 Research background ........................................................................................ 41 3.2.1 Why humans use gestures? An account with common ground theory ....... .44 3.3 A study on gesture-enabled applications .......................................................... 46 3.3.1 Experimentation method ............................................................................. .46 3.3.2 Participants of the experiment .................................................................... .47 3.4 Results of the experiment ................................................................................ .48 3.5 Conclusions ....................................................................................................... 51 Chapter 4. Adaptive skin detection ................................................................................ 53 4.1 Introduction ....................................................................................................... 53 4.2 Research background ...... . . . . . . . ..................................... . ............. . . ...................... 53 4.2.1 Popular col or spaces used for skin col or segmentation ............................... 54 4.2.2 4.2.3 4.2.4 4.3 Is there an optimum calor space for skin color detection? .......................... 55 Skin calor segmentation techniques ....... ................. . ........ . ... . ... . ........ .......... 57 Adaptive learning of the skin color .............................................................. 66 The global skin detection algorithm ............................................................. . ... 68 4.4 The adaptive skin detection algorithm .............................................................. 70 4.4.1 The algorithm in summary ........................................................................... 72 4.5 Motion detection for adaptive skin detection .................................................... 73 4.5 .1 Underlying assumptions .............................................................................. 73 4.5.2 Frame subtraction motion tracking .......... .................................................... 74 4.6 The experiment .... . . . . .... ....... . . . ..... . . . . . . . . ................... ................ ... . . .................. .... 75 4.6.1 Specifying the thresholds of the Global Skin Detector ................................ 75 4.6.2 Ground-truth data ......................................................................................... 76 4.6.3 Measured parameters ................................................................................... 78 4.6.4 Choosing the merging factor ....................................................................... 79 4.6.5 Results ......................................................................................................... 80 4.6.6 Sparseness of the correctly and falsely detected pixels ............................... 82 X 4.6.7 The comparison of the behavior of the algorithm using optical-flow motion tracking ....................... . . . . . . ..................... . . . .............. . . . . . . . . . . . . . . .. . ... . . ......................... ... 82 4. 7 Chapter summary ... ... . .... . .......................................... . . . . . . . ............ . ... . . . . . ......... ... 86 Chapter 5. A novel approach for robust tracking, based on the Mean-shift algorithm ........................................................................................................................................... 93 5.1 Introduction ............ ... . ......................... ...................... . ... . . ................ .................. 93 5.1.1 The Mean-shift algorithm ................................. .... . ................................. ... .. 94 5.1.2 Automatic resizing of the kernel.. ............ .......... .. . ... . ... ... ............. . .......... ..... 95 5.2 Research Background .................... .............. ................. . ... ........ . ....... . ............... 98 5 .2.1 The Mean-shift algorithm and feature tracking ........ ... ... ..... . ....................... 99 5.2.2 The Mean-shift algorithm and variable sized kernel ........... ...................... l01 5.3 Fuzzy-based kernel resizing ...... ... ...... ................ ....................... . ....... . ............. l04 5.3.1 Initialization and boundary detection of the kernel ..... .......................... .... 104 5.3.2 Fuzzy boundary detector ........ . .............. ............... . . . ..... ........ .... . . .. . ....... . ..... 107 5.4 Experiments and Results ............. . ... ........... ... ............ . . ..... . .............................. 108 5 .4.1 Experiment 1: Detecting a blob with no movement ............... ............ ....... 110 5.4.2 Experiment 2: Tracking the blob of a moving hand ..... ..... . ........ .......... ..... 112 5.4.3 Experiment 3: Tracking an object moving away from the camera .......... .. 113 5.4.4 Experiment 4: Tracking a moving hand in occluded situation ............. ..... 114 5.4.5 Accuracy of tracking .. ........... ........... .................. . . . . . ... . ....... . ....... . . ........... . .. 115 5.5 Implementation issues for hand and face tracking ... . . ..... . ..... . ......... . . ............ . . 117 5.5.1 The multi-tracker implementation ................... .... . . ..... ...... . ................ . . . ..... 118 5.5.2 Tracker-tracker implementation ....................... . . .............. . ........................ 121 5.5.3 Using depth information for hand and face tracking ......... . ... . . . . . ........ . . . . . . . 122 5.6 Chapter summary .... ........... . . . . . . . .................. ......... . . . . . . . ............ . . .... . .......... . . ..... 127 Chapter 6. Modeling and recognition of gesture signals in 2D space ....................... 128 6.1 Introduction ......... . . .... . . . . ........ ..................... . . . . ............ ... . . .............. . . . .... . . . . . . . . ... 128 6.2 Research background ........ . . . . . . . . ................... ... ... . ... . . . .. . . . . . .......... . . . .......... ...... .. 130 6.2.1 Time series analysis ................................. ............ . . . ......... . . . . . . . . . . ..... . . . ... . . . . . 130 6.2.2 Gesture identification through pattern recognition .. . ............................. .... 132 6.3 The gesture trajectory recognition technique ........ ...... . . . ................ ................. 135 6.3.1 Feature selection .. . . ........... . . ............... .................... . .... . . . . ....... . . .................. 135 6.3.2 Gesture Classification .... . . . . . .............. .. ....... .......... . . .......................... . . . . . ..... 138 XI 6.3.3 Recording the training data .......... . . . ................ . . . .. . . ......... . . . . . . . . . . . . . ... . . ......... 139 6.3.4 Normalizing the data ... . . . . . . . . ........... .................. .......................................... 140 6.4 The experiments ............. ........................................ ............... ............. . . . .. . ....... 140 6.4.1 The first experiment - Using one ANN ........... ........... . , ................... . . ........ 140 6.4.2 The second experiment ................................. ..... . ............ ........................... 142 6.4.3 Comparing the ANN classifier with a SVM classifiers ........................... .. 147 6.4.4 Gesture classification using one ANN with 14 outputs .... .................. ... . ... 149 6.5 Implementation of the gesture tracking system . .............. .... . .. . ........ . . . .. ... . . . . . . . 152 6.5.1 System inputs ...... . . .. . . . ..... . . ... ............................. . ........... . .......... . . . . . . . . . . . . . . . . .. 152 6.5.2 Gesture movement trajectory recognition ............ . ..................................... 153 6.6 Chapter summary ...................................... .......... ............................................ 156 Chapter 7. Summary, conclusion and future research .............................................. 158 7.1 Extending the work .. ....... . ..... ................ .......................................................... 161 7.2 The future of gesture recognition systems .......... ...... ...................................... 162 References ...................................................................................................................... 163 XII List of Figures Figure 2.1. a) The first set of features for: a) 0? rotation, b) -15? rotation, and c) -84 o rotation ...................................................................................................................... 19 Figure 3.1. (a) A metaphoric gesture: indicating an imaginary point accompanying speech, (b) A metaphoric gesture: waving hands accompanying speech, (c) A deictic gesture: counting, and (d) A deictic gesture: pointing . ................................. 50 Figure 3.2. Distribution of gestures over development stages .......................................... 51 Figure 4.1. In some conditions, the probability density of the skin color of the user is lower than some of the unwanted regions like wood color. ..................................... 62 Figure 4.2. (a, c) Original image, (b, d) Filtered image using Hue thresholding . ............. 68 Figure 4.3. Overview of the global skin detector .............................................................. 69 Figure 4.4. (a, c, e) Original Images, (b, d, f) Filtered image based on thresholding Hue factor of skin col or extracted from training data (Hue factor of the col or of the some background objects are similar to skin color) . .......................................................... 69 Figure 4.5. Overview of the adaptive skin detector ........................................................... 72 Figure 4.6. Changes in the lower threshold and upper threshold between frames 0 and 2791 . ......................................................................................................................... 72 Figure 4.7. Testing environment, (a) Camera view, (b) Configuration settings ................ 73 Figure 4.8. A moving hand: (a) Original image, (b) In-motion pixels of the frame, filtered using Hue thresholding (c) Mapping the result to the original image . ..................... 75 Figure 4.9. Samples of the input images and their ground-truth data ................................ 77 Figure 4.1 0. The accuracy of the algorithm based on merging factor ............................... 80 Figure 4.11. The changes in the output over the time . .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Figure 4.12. Track points based on traditional approach (good features to track) . ........... 84 Figure 4.13. Initializing track points, based on primary skin color filter and equal distances ..................................... ............................................................................... 84 Figure 4.14. Tracking skin color segments using LK optical -flow motion tracking and selected features to track ........................................................................................... 85 Figure 4.15. Nodding head: (a) Original image, (b) In-motion pixels of the frame, filtered using Global Skin Detector (c) Mapping the result to the original image . ............... 85 XIII Figure 4.16. A short video sequence presenting the behavior of the adaptive skin detection algorithm over time. On the top-right corner of each image, the detected silhouette of the skin is presented. The falsely detected areas were gradually eliminated over time . . . . . ....... . . . .............. . . . . . . ... . . ......... . . ...................... . . . . . . . . ....... . . . ... . . 87 Figure 4.17. The behavior of the adaptive skin detector using frame subtraction motion detection ....................... ........ ............... ...... . . .............. . . .............. ........... . . . . . . . . .... . . . .... . . 88 Figure 4.18. The results on using morphological operators on dataset 1. ...... . ..... ...... . . ..... 89 Figure 4.19. The results on using morphological operators on dataset 2 . ....... ............... . . . 90 Figure 4.20. The results on using morphological operators on dataset 3 . ......................... 91 Figure 4.21. Comparison of measured parameters of the algorithm, using different motion tracking methods (PS: Frame Subtraction, OF: Optical Flow). The values are the average of the values measured for all three datasets. However, calculating the average for a small number of datasets (3 in this case) may bias the results toward one of the datasets (e.g. data set 2 dominates the results shown in graph d) . ............ 92 Figure 5 .1. Iterations, to find the highest dense kernel using the Mean-shift algorithm. The initial position of the kernel is [1] which has a small overlap with the face blob. 2, 3, 4, 5, are the new positions of the kernel to match the center of gravity of the kernel to the geometrical center of the kernel... .................. ................................... . . . 95 Figure 5.2. Choosing the size and initial placement of the kernel. (a) incorrect placement of the kernel, (b) choosing a kernel that is too large, (c) choosing a too small kernel. ................ . ........................................................................................................... . . .... 96 Figure 5.3. Inhomogeneous detection of the skin pixels in the image. The density of the pixels are varying in the face area which makes the detection and tracking of the face blob more difficult. .. ....................................................................................... 105 Figure 5.4. Fuzzy values for the inputs of the fuzzy boundary detector ......................... 107 Figure 5.5. Fuzzy outputs for the fuzzy boundary detector. .................... . ....................... 108 Figure 5.6. Detected boundary by the two algorithms, surrounding a static image silhouette ... . . . . ... . . ............. . . . . . . . . . ........ .................... ...... ... ....... ......... ....... ................... 110 Figure 5.7. a) The correct detection determined by Edge density-fuzzy. The smaller rectangle is the result of kernel density-based- sqrt(mOO)- method. b) Behavior of the algorithms with a noise level of 15% . . .... ... . ............. ......................................... 112 Figure 5.8. Tracking the hand in a "grabbing" hand gesture with white noise 20%, a) Original image sequence, b) Centre of gravity, c) Error of displacement in comparison to an ideal tracker (Xc-Xc idea1) . ...... . . ................. ........................ ........... 113 Figure 5.9. Tracking the centre of gravity- zoom out with 25% white noise, a) Original image sequence, b) Xc centre of gravity, c) Area of the kernel, and d) Error of placement of the kernel in comparison to an ideal tracker (X-Xictea1) ?????????????????????? 114 XIV Figure 5.10. Moving hand, in occluded situation, a) the original image sequence, b) Xc of the centre of gravity in noise 20%, d) Error in Xc of the kernel in comparison to an ideal tracker (X-XicteaJ) ............................................................................................. 115 Figure 5.11. The average Mean-distance between the trackers and the ideal tracker in noise level 0% to 30%. a) "Grabbing hand gesture dataset" (experiment 2) , b) "Face zoom out" dataset (experiment 3) , and c) "Occluded hands" dataset (experiment 4) . ........................... ......................................................................... . ........................... 117 Figure 5.12. Face tracking using the proposed algorithm for skin detection, and the Mean- shift algorithm for blob tracking . ............................................................................ 119 Figure 5.13. Running multiple trackers on the image sequence of hand movement based on the algorithm described in Section 5.5.1.. .......................................................... 120 Figure 5.14. Robust hand and face tracking using the tracker-tracker algorithm on an image sequence of different bi-hand movements . .................................................. 122 Figure 5.15. a) A sample image, b) the depth information of the sample image. The light area is the object closer to the camera, the black patches are of unknown depth . .. 124 Figure 5.16. Background elimination using the depth thresholding technique . .............. 125 Figure 5.17. Some of the scenarios in which depth information can be useful for occlusion resolution . ............................................................................ .............. ..... 126 Figure 5.18. Occlusion resolution using the depth information and the Mean-shift algorithm. The tracked blobs are displayed on the disparity image (right) . ........... 127 Figure 6.1. a) Quantized input vectors, b) Gesture vector (0, 13, 0), c) Gesture vector (17, 31, 17) ..................................................................................................................... 136 Figure 6.2. a) Original gesture trajectory, b) sampling from the gesture trajectory, c) gesture signal of the coiiected data over time, d) reconstructed gesture using [c] .. 137 Figure 6.3. Left: Gesture signal, Right: gesture movement trajectories ............... ....... 138 Figure 6.4. Gesture recognition based on input gesture signal to a classifier. ................ 138 Figure 6.5. The gesture signal recording software . ......................................................... 139 Figure 6.6. The structure of the ANN for gesture classification ..................................... 141 Figure 6.7. Performance of the ANN over time while training (Goal was 0) . ................ 142 Figure 6.8. Continues gesture detection using one ANN ................................................ 142 Figure 6.9. The histogram of the output values of the ANNs for the test data ................ 146 Figure 6.10. The effect of the change of the threshold on average correct detection ratio (middle vertical axis) and false detection ratio (right vertical axis) . ...................... 146 Figure 6.11. The evaluation of a support vector machine in classifying the 13 gesture signals in comparison to the ANN classifiers . ........................................................ 148 Figure 6.12. The structure of ANN for detecting multiple gesture signals . .................... 150 XV Figure 6.13. Comparison of correct positive and false positive recognition accuracy of a single ANN in gesture signal classification employing different numbers of hidden layers .................................................................................................... .. ................. 151 Figure 6.14. Recording the hand movement trajectory using the Fuzzy-based Mean-shift tracker. The gesture movement trajectory signal is presented on the right-top corner of each frame . . ... . . . . . . . ............ .............. ................................ .......... ... ... .......... .. ........ 153 Figure 6.15. Implementation of the vision-based gesture recognition using one ANN and the Fuzzy-based Mean-shift tracker. The diagram on the right-bottom corner of the image represents the detected gestures over time . .............. ...... . .... ... . ..................... 155 Figure 6.16. Recognition of gesture signals number 13 and 9 using the vision-based gesture recognition system . ...... ......................................... .. .. ........ . . ..... ... ............ . .. 155 XVI List of Tables Table 3.1. Different stages of mathematical skills development based on the Numeracy project ....................................................................................................................... 48 Table 3.2. Gesture use in the experiment. ........................................................................ .48 Table 3.3. Hand or body gesture use over developmental stage ...................................... .49 Table 4.1. An overview on calor spaces and their applications in skin detection ............. 55 Table 4.2. Comparative evaluation of different skin detectors .......................................... 58 Table 4.3. The characteristics of the image datasets used for the experiments ................. 76 Table 5 .1. Fuzzy controller for the fuzzy boundary detector .......................................... 1 07 Table 5.2. Comparison of accuracy of the algorithms ..................... ................. . . ............. 116 Table 6.1. First experiment, detecting multiple gestures using a single ANN ................ 141 Table 6.2. The data recorded for the second experiment, using the gesture recorder and a digital tablet. ........................................................................................................... 144 Table 6.3. The results concluded from the experiment 2 (one ANN for each gesture signal) ..................................................................................................................... 145 Table 6.4. Analyzing the accuracy/inaccuracy of multiple ANNs for gesture classification using confusion matrix (all values are in percent) . ................................................. 147 Table 6.5. The evaluation of a support vector machine in classifying the 13 gesture signals . ............................................................. . . ... ....... . . ......................................... 148 Table 6.6. The evaluation of a single ANN in classifying 14 gesture classes using 2 hidden layers . .......................................................................................................... 150 Table 6. 7. Comparison of the accuracy of a single ANN containing 0 to 7 hidden layers in recognizing the 14 gesture signal classes . .............................................................. 151 Table 6.8. Comparison the accuracy of the proposed method with other gesture recognition systems in the literature . ...................................................................... 157 XVII List of Acronyms ANN AP ASD ASL BBN CAM-Shift CCD CIELAB,CIEXYZ CP CPU CRC N N DOG DTW GHz GMM GSD HCE HCI H I HIS HMM HS HSV ICrCb ICT IUV LCS MSEPF NN PC Artificial Neural Network Attention Point Adaptive Skin Detector American Sign Language Bayesian Belief Network Continuously-Adaptive Mean-Shift Charged Coupled Device Both calor spaces, refer to perceptually linear calor spaces known as C I E-L-A-B and CIE-X-Y-Z Calor Predicate Central processing Unit Hyper Rectangular Composite Neural Network Difference of Gaussian Dynamic Time Warping Giga Hertz Gaussian Mixture Model Global Skin Detector Histogram Error Human-Computer I nteraction Histogram Intersection H ue-Satu ration-Intensity (Calor model) Hidden Markov Model Hue-Saturation (calor model) Hue-Saturation-Value (Calor model) lntensity-Chrominance red-Chromina nce blue (Color model) I nformation-Communication Technology l ntensity-Luminance-Chrom inance (Calor model) Localized Contour Seq uence Mean-Shift Embedded Particle Filter Neural Network Personal Computer XVIII PCA PDA PDF PUI RBF RGB SASOM SMA SOM STFT SVM tMH I TMS UAV Ul UR USB YCrCb YIQ YUV Principal Component Analysis Personal Digital Assistant Probability Density Function Perceptual User I nterface Radial Basis Function Red-Green-Blue (Calor model) Structure Adaptive Self-Organized Map Specialized Mapping Arch itecture Self-Organized Map Short-Time Fourier Transform Support Vector Machine Time-Motion H istory Image Transcranial Magnetic Stimulation Un manned Aerial Vehicle User Interface Unreliability Rate Universal Serial Bus YCrCb is an encoded nonlinear RGB signal, commonly used by European television studios and for image compression. Calor model inspired from human vision system, formerly used in NTSC television b roadcasting Luminance-Chrominance (Calor model, mainly is used in PAL analog television broadcasting) XIX