Learning-based robotic manipulation for dynamic object handling : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Mechatronic Engineering at the School of Food and Advanced Technology, Massey University, Turitea Campus, Palmerston North, New Zealand

Thumbnail Image
Open Access Location
Journal Title
Journal ISSN
Volume Title
Massey University
Recent trends have shown that the lifecycles and production volumes of modern products are shortening. Consequently, many manufacturers subject to frequent change prefer flexible and reconfigurable production systems. Such schemes are often achieved by means of manual assembly, as conventional automated systems are perceived as lacking flexibility. Production lines that incorporate human workers are particularly common within consumer electronics and small appliances. Artificial intelligence (AI) is a possible avenue to achieve smart robotic automation in this context. In this research it is argued that a robust, autonomous object handling process plays a crucial role in future manufacturing systems that incorporate robotics—key to further closing the gap between manual and fully automated production. Novel object grasping is a difficult task, confounded by many factors including object geometry, weight distribution, friction coefficients and deformation characteristics. Sensing and actuation accuracy can also significantly impact manipulation quality. Another challenge is understanding the relationship between these factors, a specific grasping strategy, the robotic arm and the employed end-effector. Manipulation has been a central research topic within robotics for many years. Some works focus on design, i.e. specifying a gripper-object interface such that the effects of imprecise gripper placement and other confounding control-related factors are mitigated. Many universal robotic gripper designs have been considered, including 3-fingered gripper designs, anthropomorphic grippers, granular jamming end-effectors and underactuated mechanisms. While such approaches have maintained some interest, contemporary works predominantly utilise machine learning in conjunction with imaging technologies and generic force-closure end-effectors. Neural networks that utilise supervised and unsupervised learning schemes with an RGB or RGB-D input make up the bulk of publications within this field. Though many solutions have been studied, automatically generating a robust grasp configuration for objects not known a priori, remains an open-ended problem. An element of this issue relates to a lack of objective performance metrics to quantify the effectiveness of a solution—which has traditionally driven the direction of community focus by highlighting gaps in the state-of-the-art. This research employs monocular vision and deep learning to generate—and select from—a set of hypothesis grasps. A significant portion of this research relates to the process by which a final grasp is selected. Grasp synthesis is achieved by sampling the workspace using convolutional neural networks trained to recognise prospective grasp areas. Each potential pose is evaluated by the proposed method in conjunction with other input modalities—such as load-cells and an alternate perspective. To overcome human bias and build upon traditional metrics, scores are established to objectively quantify the quality of an executed grasp trial. Learning frameworks that aim to maximise for these scores are employed in the selection process to improve performance. The proposed methodology and associated metrics are empirically evaluated. A physical prototype system was constructed, employing a Dobot Magician robotic manipulator, vision enclosure, imaging system, conveyor, sensing unit and control system. Over 4,000 trials were conducted utilising 100 objects. Experimentation showed that robotic manipulation quality could be improved by 10.3% when selecting to optimise for the proposed metrics—quantified by a metric related to translational error. Trials further demonstrated a grasp success rate of 99.3% for known objects and 98.9% for objects for which a priori information is unavailable. For unknown objects, this equated to an improvement of approximately 10% relative to other similar methodologies in literature. A 5.3% reduction in grasp rate was observed when removing the metrics as selection criteria for the prototype system. The system operated at approximately 1 Hz when contemporary hardware was employed. Experimentation demonstrated that selecting a grasp pose based on the proposed metrics improved grasp rates by up to 4.6% for known objects and 2.5% for unknown objects—compared to selecting for grasp rate alone. This project was sponsored by the Richard and Mary Earle Technology Trust, the Ken and Elizabeth Powell Bursary and the Massey University Foundation. Without the financial support provided by these entities, it would not have been possible to construct the physical robotic system used for testing and experimentation. This research adds to the field of robotic manipulation, contributing to topics on grasp-induced error analysis, post-grasp error minimisation, grasp synthesis framework design and general grasp synthesis. Three journal publications and one IEEE Xplore paper have been published as a result of this research.
Figures are re-used in this thesis with permission of their respective publishers or under a Creative Commons licence.
Manipulators (Mechanism), Robots, Industrial, Design and construction, Machine learning, Computer vision, 461105 Reinforcement learning