Real-time vision-based hand and face tracking and recognition of gesture : a PhD dissertation submitted in partial fulfillment of the requirement for the degree of Doctor of Philosophy (Ph.D.) in Computer Science
In this dissertation, we present the research pathway to the design and implementation of a real-time vision-based gesture recognition system. This system was built based on three components, representing three layers of abstraction: i) detection of skin and localization of hand and face, ii) tracking multiple skin blobs in video sequences and finally iii) recognition of gesture movement trajectories. The adaptive skin detection, the first component, was implemented based on our novel adaptive skin detection algorithm for video sequences. This algorithm has two main sub-components: i) the static skin detector, which is a skin detection method based on the hue factor of the skin color, and ii) the adaptive skin detector which retrains itself based on new data gathered from movement of the user. The results of our experiments show that the algorithm improves the quality of skin detection within the video sequences. For tracking, a new approach for boundary detection in blob tracking based on the Mean-shift algorithm was proposed. Our approach is based on continuous sampling of the boundaries of the kernel and changing the size of the kernel using our novel Fuzzy-based algorithm. We compared our approach to the kernel density-based approach, which is known as the CAM-Shift algorithm, in a set of different noise levels and conditions. The results show that the proposed approach is superior in stability against white noise, and also provides correct boundary detection for arbitrary hand postures, which is not achievable by the CAM-Shift algorithm. Finally we presented a novel approach for gesture recognition. This approach includes two main parts: i) gesture modeling, and ii) gesture recognition. The gesture modeling technique is based on sampling the gradient of the gesture movement trajectory and presenting the gesture trajectory as a sequence of numbers. This technique has some important features for gesture recognition including robustness against slight rotation, a small number of required samples, invariance to the start position and device independence. For gesture recognition, we used a multi-layer feed-forward neural-network. The results of our experiments show that this approach provides 98.71% accuracy for gesture recognition, and provides a higher accuracy rate than other methods introduced in the literature. These components form the required framework for vision-based real-time gesture recognition and hand and face tracking. The components, individually or as a framework, can be applied in scientific and commercial extensions of either vision-based or hybrid gesture recognition systems.