Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere without the permission of the Author. TOWARDS EXPLAINING BLACKBOX MODELS USING GENETIC NETWORK PROGRAMMING A thesis presented in partial fulfilment of the requirements for the degree of Master of Information Science in Computer Science at Massey University, Albany, New Zealand. Haotian Zhu 2024 Contents Abstract ix Acknowledgements x 1 Introduction 1 1.1 Overview of the Current State of Technology . . . . . . . . . . . . . . . 1 1.1.1 Explainable AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Evolutionary Algorithm . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Scope of Research and Problem Domain . . . . . . . . . . . . . . . . . . 3 1.2.1 Scope of research . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Problem Domain: The Taxi Problem . . . . . . . . . . . . . . . . 4 1.3 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Significance of the Research . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Background and Literature Review 10 2.1 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.1 Life-Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Chromosome Structure . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.3 Evaluation and selection . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.4 Evolution Operation . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 Life-Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Chromosome Structure . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.3 Evaluation and Selection . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.4 Evolution Operations . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Genetic Network Programming . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.1 Chromosome Structure . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2 Function Library . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 ii 2.3.3 Chromosome Transaction . . . . . . . . . . . . . . . . . . . . . . 17 2.3.4 Evaluation and Selection . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.5 Evolution Operator . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.6 Evaluation and Selection . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.7 Evolution Operator . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.1 Improved GNP Operators and Structure . . . . . . . . . . . . . . 20 2.4.2 GNP Combined with Other Algorithms . . . . . . . . . . . . . . 22 2.4.2.1 GNP with EDAs . . . . . . . . . . . . . . . . . . . . . . 22 2.4.2.2 GNP with Reinforcement Learning . . . . . . . . . . . . 24 2.4.2.3 GNP with Ant Colony Optimization . . . . . . . . . . . 25 2.4.2.4 GNP with ACO Life Cycle . . . . . . . . . . . . . . . . 26 2.4.3 Multi-goal Path-finding problem with evolution algorithm . . . . 27 2.4.3.1 Travelling Salesman Problem . . . . . . . . . . . . . . . 28 2.4.3.2 Robotic path planing . . . . . . . . . . . . . . . . . . . 29 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3 General Strategies 33 3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Top-Bottom Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.1 ”Black Box” Conception . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.2 Bridging ”Black Box” and Computational Graphs . . . . . . . . 36 3.2.3 Benefits in GNP Research . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Sources of Ground Truth for the Problem . . . . . . . . . . . . . . . . . 37 3.3.1 Approach 1: Handcrafted Solution Dataset . . . . . . . . . . . . 38 3.3.2 Approach 2: GNP Solution Dataset . . . . . . . . . . . . . . . . 38 3.4 Divide and Conquer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4.1 Introduction to the Customized Strategy . . . . . . . . . . . . . 39 3.4.2 Complementing the Top-Bottom Approach . . . . . . . . . . . . 39 3.4.3 Implementation in GNP . . . . . . . . . . . . . . . . . . . . . . . 39 3.5 Hand-crafting a GNP Solution . . . . . . . . . . . . . . . . . . . . . . . 40 3.5.1 Design Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5.2 Decision-Making Process Outline: . . . . . . . . . . . . . . . . . . 41 3.5.3 Graph representation . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4 Using a Black Box as a Fitness Function in a GNP 47 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 GNP Life-Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 iii 4.3 Individual Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3.1 Phenotype Structure . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.2 Genotype Structure . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3.3 Function Library: . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4 Fitness function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4.3 Structure of the fitness function . . . . . . . . . . . . . . . . . . . 62 4.4.4 Running Individual Structure . . . . . . . . . . . . . . . . . . . . 68 4.4.5 Fitness equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.5 Parent Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.6 Evolution Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.6.1 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.6.2 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5 Black Box System to GNP Conversion 82 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.2 Dataset Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2.1 Hand-crafted solution dataset . . . . . . . . . . . . . . . . . . . . 85 5.2.2 Pattern Identification . . . . . . . . . . . . . . . . . . . . . . . . 86 5.3 Sub-path Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.3.1 Function find subsequences of length . . . . . . . . . . . . . 86 5.3.2 Function findUniqueSubsequences . . . . . . . . . . . . . . . . 88 5.4 GNP solution construction . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6 Experiments and Analysis of Results 97 6.1 Hand-crafting a GNP Solution . . . . . . . . . . . . . . . . . . . . . . . 97 6.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.1.2 Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.1.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.2 Using a Black Box System as Fitness Function in a GNP . . . . . . . . 99 6.2.1 Learning Ability of the Proposed GNP variant . . . . . . . . . . 99 6.3 Experiment on Proposed GNP vs Original GNP . . . . . . . . . . . . . 104 6.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.3.2 Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.3.3 Target: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.3.4 Training Results: . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 iv 6.4 GNP solution refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.4.2 Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.5 Experiment on Finding the Minimum Training Patterns . . . . . . . . . 108 6.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.5.2 Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.5.3 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.6 Limitation of this research . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7 Conclusion and Future Work 112 7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.2.1 Scalability Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.2.2 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.2.3 Reduce the Need for a Priori Knowledge . . . . . . . . . . . . . . 114 7.2.4 Interpretability and Transparency Enhancements . . . . . . . . . 114 Bibliography 115 v List of Figures 1.1 Classification of metaheuristic Algorithms . . . . . . . . . . . . . . . . . 2 1.2 Taxi Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 GA evolution life cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 GA chromosome structure . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 GA Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 GA Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 GP Chromosome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.6 GP evolution operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.7 GNP Chromosome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.8 GNP Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.9 GNP Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.10 GNP-RL chromosome structure . . . . . . . . . . . . . . . . . . . . . . . 25 2.11 GNP-RL transection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.12 GNP vs ACO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.13 GNP-ACO life-cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.14 Chromosome structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.15 RTP-GA life cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1 The continuous decision-making cycle of an agent in the Taxi Problem. . 41 3.2 The visual representation of a computer program as a directed graph. . 44 3.3 Function library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.4 Function library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.1 Proposed GNP variant life cycle. . . . . . . . . . . . . . . . . . . . . . . 48 4.2 GNP graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3 Graph structure for ”pick-up” sub-problem. . . . . . . . . . . . . . . . . 51 4.4 Graph structure for ”drop-off” sub-problem. . . . . . . . . . . . . . . . . 52 4.5 Genotype Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.6 Taxi Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.7 Runing the individual . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 vi 4.8 Runing the individual with black box . . . . . . . . . . . . . . . . . . . . 75 5.1 GNP solution Refinement process. . . . . . . . . . . . . . . . . . . . . . 82 5.2 Dataset Generation process. . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3 Dataset sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.4 Example of a single path in the handcraft solution dataset . . . . . . . . 87 5.5 Unique sub-paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.6 Final constructed graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.7 Graph visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.1 Hand-crafted solution test result . . . . . . . . . . . . . . . . . . . . . . 99 6.2 GNP parameter settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.3 Proposed GNP variant test result . . . . . . . . . . . . . . . . . . . . . . 101 6.4 Average Population Fitness vs. Environment Configuration . . . . . . . 101 6.5 Environment 1: pick-up . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.6 Environment 1: drop off . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.7 Environment 2: pick-up . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.8 Environment 2: drop off . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.9 Environment 3: pick-up . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.10 Environment 3: drop off . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.11 Environment 4: pick-up . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.12 Environment 4: drop off . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.13 Environment 5: pick-up . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.14 Environment 5: drop off . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.15 Environment 6: pick-up . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.16 Environment 6: drop off . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.17 Original GNP vs Proposed GNP for pick-up sub-problem . . . . . . . . 105 6.18 Original GNP vs Proposed GNP for drop-off sub-problem . . . . . . . . 105 6.19 Sample Original GNP vs Proposed GNP graph solution for drop-off sub- problem (Environment 11) . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.20 Proposed Methodology test result . . . . . . . . . . . . . . . . . . . . . . 109 6.21 Incorrect computer graph . . . . . . . . . . . . . . . . . . . . . . . . . . 110 vii List of Algorithms 1 Hand-crafted Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2 Fitness evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3 A* Pathfinding for distance calculation . . . . . . . . . . . . . . . . . . . 67 4 Graph Traversal Function . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5 Tournament Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6 Mutate Node Connections . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7 Handcraft Dataset Generation . . . . . . . . . . . . . . . . . . . . . . . . 85 8 Reconstruct GNP solution from Paths . . . . . . . . . . . . . . . . . . . 93 9 Find Unique Paths in a Graph . . . . . . . . . . . . . . . . . . . . . . . 107 10 Find Positions of sub-paths in a complete path . . . . . . . . . . . . . . 107 11 Extract and Concatenate Sub-arrays . . . . . . . . . . . . . . . . . . . . 107 viii Abstract With the emergence of deep learning systems capable of learning intricate robot ma- noeuvres, control, team coordination, and planning both from data and through inter- action with the environment, numerous complex and non-linear problems have found solutions. However, these systems often function as ’black boxes,’ lacking the ability to provide human-interpretable solutions. This study addresses the interpretability chal- lenge in the field of Explainable AI by employing a ’black box’ system as a target model and subsequently transforming it into a computational graph, equivalent to a Genetic Network Program (GNP). Furthermore, we present a methodology for refining and re- ducing the size of the GNP solution. Lastly, we also test if we could utilize the black box system as a guide to the fitness function in a GNP architecture. To illustrate its efficacy, we use a multi-goal path-finding problem from the OpenAI Gym framework. The experimental results demonstrate the efficacy of the converted and refined GNP solution in successfully addressing the taxi problem across 500 environments, constitut- ing a comprehensive dataset. Notably, the refined GNP solution exhibits no redundant or unnecessary nodes. Despite the research’s focused scope, centering on a single agent with multiple goals, the algorithms introduced in this study lay the groundwork for the development of more sophisticated and interpretable algorithms. These advancements are poised to tackle more intricate challenges in the future. ix Acknowledgements I would like to extend my deepest gratitude to my supervisor, Dr. Napoleon Reyes, for his invaluable guidance, patience, and support throughout this research journey. He put in a lot of effort during the research. He made slides, provided the research direction, and had meetings with me every week, each lasting at least one and a half hours. His insights and expertise have been fundamental to the completion of this thesis. I am profoundly grateful to my family for their unwavering love, understanding, and encouragement. Their support has been my strength. Moreover, I would also like to extend my thanks to my colleague, for their support and encouragement. Balancing full- time study with my responsibilities as a full-time developer was a challenging journey. x Chapter 1 Introduction 1.1 Overview of the Current State of Technology 1.1.1 Explainable AI Explainable AI (XAI) refers to methods and techniques in artificial intelligence (AI) that make the AI systems understandable to humans. These methods aim to let hu- man users understand and trust the results and output created by machine learning algorithms[1]. From the 1970s to the 1990s, symbolic reasoning systems like MYCIN, GUIDON, SOPHIE, and PROTOS could represent, reason about, and explain their reasoning for diagnostic, instructional, or machine-learning (explanation-based learning) purposes[2]. From the 1980s to the early 1990s, truth maintenance systems (TMS) enhanced the capabilities of causal-reasoning, rule-based, and logic-based inference systems. Many researchers started investigating whether it is possible to meaningfully extract the non- hand-coded rules generated by opaque trained neural networks[3] [4]. Nowadays, some approaches have been developed to make complex models such as machine learning and evolutionary algorithms more explainable and interpretable[5]. Other approaches explain some specific prediction made by a (nonlinear) black-box model, a goal referred to as ”local interpretability” [6] [7]. This lack of transparency can be a serious problem in critical applications like healthcare, finance, and legal systems, where understanding the reasoning behind a decision is just as important as the decision itself. 1 CHAPTER 1. INTRODUCTION 2 1.1.2 Evolutionary Algorithm Metaheuristic algorithms are special methods used in various areas like economics, engineering, and management to solve complex problems[8]. Many of these algorithms are inspired by natural selection, swarm behavior, and the laws of physics. There are two main types of metaheuristic algorithms (Fig. 1.1): single-solution and population- based metaheuristics. Figure 1.1: Classification of metaheuristic Algorithms Single-solution metaheuristics: These focus on improving a single solution through local search. However, they can sometimes get trapped in local optima[9]. Examples include simulated annealing and tabu search. Population-based metaheuristics: These work with multiple possible solutions at the beginning of the algorithm. This approach helps maintain chromosome diversity and prevents the evolution from getting stuck in local optima. These methods are generally better for finding the best solution across a complex set of possible solutions. Some well-known examples are Genetic Algorithms, Particle Swarm Optimization, and Ant Colony Optimization. Among the population-based metaheuristics, Holland proposed the Genetic Algo- rithm (GA) in 1975[10]. Inspired by the Darwinian biological evolution process, the GA evolution begins with a population of completely random individuals. The individuals are composed of genes, which are represented in binary or other representations. The fitness of the entire population is evaluated in each generation, and multiple individuals are randomly selected from the current population (based on their fitness) to produce a new population of life through natural selection and mutation, which becomes the CHAPTER 1. INTRODUCTION 3 current population in the next iteration of the algorithm. In 1992, Koza conducted a comprehensive study on Genetic Programming (GP) [11]. Unlike GA, GP uses a tree-type data structure as the chromosome, each chromosome is a computer program, which can represent a solution for a given problem domain. The amount of computation required for GP is so large (dealing with a large number of candidate computer programs) that it could only be used to solve simple problems back in the 1990s. In recent years, with the development of GP and the exponential increase in the computational power of central processors, GP started to produce a large number of remarkable results. In 2002, Genetic Network Programming (GNP) [12] was introduced. As a special type of GP, it uses a graph-type data structure instead of a tree-type data structure. Compared to GP, GNP is more efficient in solving complex problems in a changing environment. Subsequently, many other flavors of GNP have been developed to enhance its efficiency. These include GNP chromosome structure modification[13], features of mutation and crossover modification[14], and combinations of GNP with other machine learning algorithms such as reinforcement learning, Ant Colony algorithms, and fuzzy logic. In recent years, GNP has been widely used in many real-life applications, includ- ing robot navigation, optimization of financial portfolios, image and signal processing techniques, and data routing for network design. The usage of these techniques is grow- ing in many other fields along with the development of GNP, such as environmental modeling, transportation planning, and optimization of energy systems. 1.2 Scope of Research and Problem Domain 1.2.1 Scope of research With the development of A.I., many complex and non-linear problems have been solved. However, these solutions, while very powerful, are not human-interpretable; they op- erate like “black boxes” with millions of neurons and billions of connection weights. The scope of this research is to interpret these ”black boxes” by converting them into a computational graph, which is interpretable and reusable. The proposed approach assumes that we have a set of judgment nodes and processing nodes that can be utilized in a computational graph, tailored to the problem domain. We have only tested the proposed method on the taxi problem, but other similar problems can be tackled as CHAPTER 1. INTRODUCTION 4 well (e.g., tile world, maze problem, etc.). On the other hand, the multi-goal path-finding problem is a classic research problem in GNP. An improved GNP is needed for this kind of question, and the solution should be optimal in real life. The scope of this research includes using GNP and other methodologies to find the optimal solution for a single-agent multi-goal problem in a stochastic environment. The research uses the taxi problem as a case study to conduct a comprehensive analysis. This study defines a methodology for converting a ”black box” system into a computational graph representation and solving the multi-goal path-finding problem with an optimal GNP solution in general. 1.2.2 Problem Domain: The Taxi Problem The Taxi problem [15] is a popular toy problem used to demonstrate and test algo- rithms. It involves an environment where a taxi must pick up a passenger at one location and drop them off at another within a grid world. The goal is for the taxi to find the shortest, collision-free path from its current location to the pick-up/drop-off location. Actions: Basically, The taxi has 6 kind of movements: 1. Move Up (MU) 2. Move Down(MD) 3. Turn Left (TL) 4. Turn Right (TR) 5. Pick Up (PU) 6. Drop Off (DO) • Environment Simulation The research uses the ’Taxi-v3’ environment provided by the OpenAI Gym library as the simulation platform for the algorithm. The environment described is a 5x5 two-dimensional grid world, featuring walls, obstacles, and four distinct locations marked as R (Red), G (Green), Y (Yellow), and B (Blue), as depicted in (Fig. 1.2). CHAPTER 1. INTRODUCTION 5 The taxi must navigate any of the 25 grid squares and solve a total of 500 unique environmental scenarios with six deterministic actions: move south, north, east, west, pick up, and drop off the passenger. Figure 1.2: Taxi Problem[15] . The implementation part of this study is developed using Python with the PY- GAD library. The GNP will guide the taxi to find the optimal solutions for different environments. 1.3 Research Objectives The main objective of this research is to convert a “black box” solution into a com- putational graph solution. This research focuses on providing a general, innovative approach to achieve this. Additionally, this research develops a novel enhanced version of Genetic Network Programming (GNP) based on the “black box” solution to address the challenge of single-agent, multi-goal path planning in stochastic environments. The goal is to establish a generalized approach for effectively solving this specific type of path planning issue with GNP. Furthermore, the research provides unique techniques aimed at ensuring that the produced GNP solution is optimal for this type of problem. The following outlines the specific objectives: • Convert a “black box” solution into a computational graph solution. • Design a solution to simulate the ”black box” system for the specified problem domain. • Test and validate the generalizability of the designed solution across different scenarios. CHAPTER 1. INTRODUCTION 6 • Design the GNP chromosome based on the optimal solution. • Solve the multi-goal path-finding challenges with our proposed Divide-and-Conquer approach. • Accelerate the GNP evolution using the optimal solution. • Develop an improved GNP capable of solving these problems effectively. • Create a technique to refine the result and generate the optimal GNP solution. • Conduct tests to assess and validate the generalizability of the solution. • Define and propose a general comprehensive approach for addressing multi-goal path-finding challenges using GNP. 1.4 Significance of the Research This thesis can be distinctly distinguished from other GNP works based on the main ap- proach that we have taken (from top to bottom): we started by creating a hand-crafted solution as a stand-in for the ”black box” system (top), which guides the evolution of the best GNP individual (bottom). The ”black box” system serves as the perfect model and may come in the form of a deep neural network or other advanced algorithms that do not have any explanation capabilities. On the other hand, the target solution is a computational graph that is elegant, highly interpretable, and reusable. Our proposed method details how we can utilize the ”black box” system in constructing a chromosome structure, fitness function, and modified crossover and mutation operators for our novel GNP variant. Moreover, there are some challenging multi-agent problems, such as the TileWorld problem, that cannot be solved 100 percent using GNP[14], so resorting to using other advanced algorithms and then converting them to interpretable solutions is an interesting avenue of research to pursue. In this research, the GNP solution will be refined to generate the perfect GNP solution. This research has the following innovations: First, this study proposes and defines the ”Top-Bottom” approach to convert an unexplained but very powerful “black box” solution into a computational graph solution. In previous studies, various methods have been explored to address this problem, but there has been no systematic and effec- tive approach specifically using Genetic Network Programming. Second, this research presents a novel Genetic Network Programming variant that employs an innovative fitness evaluation method, customized chromosome structure, and evolution operators CHAPTER 1. INTRODUCTION 7 to accelerate GNP evolution. This enhancement enables quicker decision-making for agents in changing environments. Third, we have developed a new methodology that refines solutions to ensure the generated solution is perfect for the problem domain that we used as a testbed. The methodology can guarantee the generated result is perfect, which is crucial for path planning in real-life scenarios. 1.5 Research Methodology The research was conducted through a combination of literature review and software development. The methodology and procedure are structured into the following steps: • Develop a hand-crafted solution as the ”black box” system for the research. • Study and understand the input and output of the ”black box” system. • Conduct a comparative study of the original GA, GP, and GNP, highlighting the differences among these approaches. • Develop a prototype Genetic Network Programming software to address a simple problem, with a focus on understanding each component of GNP. • Investigate current multi-goal path planning challenges using GA, GP, or GNP, exploring the methods employed in existing research. • Break down the taxi problem into smaller, more manageable sub-problems. • Develop basic GNP software capable of solving these sub-problems in the taxi scenario. • Analyze how the ”black box” system assists the basic GNP in solving the taxi problem. • Conduct research on fitness functions, particularly those relevant to path planning and multi-goal path-planning issues. • Enhance the GNP with a unique fitness function and integrate the ”black box” system to effectively solve the taxi problem. • Design a comprehensive test plan to evaluate the proposed solution. • Observe and analyze the pattern of solutions, verifying them with software im- plementation. CHAPTER 1. INTRODUCTION 8 • Develop an approach specifically designed to refine the solution. • Conduct extensive testing of the developed approach. • Summarize the entire research process, including the methodologies and proce- dures employed. 1.6 Structure of the Thesis This thesis is structured according to nine main parts, including the Introduction. • Chapter 2: Background This chapter introduces the background of GA, GP, and GNP in detail. It sum- marizes the advantages and disadvantages of GA, GP, and GNP. • Chapter 3: Literature Review This chapter provides a comprehensive review of existing literature on different variants of GNP and related works for ”black box” interpretation. • Chapter 4: Methods This section explains the detailed ’Top-Bottom’ approach, which is an approach to convert the ’black box’ to the GNP solution. It also explains the hand-crafted solution, which simulates the ’black box’ system and serves as the ground truth of this research. • Chapter 5: Proposed GNP Variant A detailed exploration of GNP implementation is presented here, including the structure of chromosomes, the development of a fitness function, and the cus- tomization of crossover and mutation processes. • Chapter 6: Solution Refinement This chapter explains the proposed methodology for refining the GNP solution to an optimal solution. • Chapter 7: Experiment This part of the thesis discusses the methods and results of testing and validating our proposed GNP and methodology, ensuring its reliability and accuracy. CHAPTER 1. INTRODUCTION 9 • Chapter 8: Discussion This chapter discusses the results of the experiment and covers any missed details from the previous chapters. • Chapter 9: Summary and Future Work The final chapter provides a summary of the findings and contributions of the thesis. It also outlines potential areas for future research and development in this field. Chapter 2 Background and Literature Review To better understand the context of this research, it is important to understand some knowledge and background related to this research. In this section, GA, GP, and GNP will be explained and compared with each other. This section not only explains how these evolutionary algorithms work but also summarizes the advantages and disadvan- tages of GA, GP, and GNP. This section is the foundation of our study. 2.1 Genetic Algorithm Inspired by the biological evolution process, an evolutionary algorithm called the Ge- netic Algorithm (GA) was invented by Holland in 1975. It is used to find the optimal solution for a given problem domain. Similar to natural selection, the optimal solu- tion can be evolved from a population of candidate solutions. The evolution process is controlled by parent selection, mutation, and crossover. 2.1.1 Life-Cycle The figure (Fig. 2.1) shows the evolution life cycle of the GA. There are five phases in the GA life cycle: initial population, fitness function, parent selection, mutation, and crossover. For a certain problem domain, a population is initialized and the population 10 CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 11 is composed of a set of chromosomes. The fitness function will evaluate every single chromosome, and for each chromosome, it will return a fitness score. In the parent selection phase, a set of chromosomes will be selected based on the fitness score for the crossover phase. In the crossover phase, a pair of chromosomes will be paired for swapping their genes to generate new offspring. In the mutation phase, one of the genes in the chromosome will be mutated with a certain mutation rate. The new generation is generated after these five phases, and the new generation will be used as the new population for the next evolution life cycle. After several generations, the solution will be found. Figure 2.1: GA evolution life cycle 2.1.2 Chromosome Structure The initial population consists of a set of chromosomes or individuals. Each chromo- some represents a solution in the search space for a given problem. The solution is encoded in a certain bit string format, and the encoding scheme depends on the prob- lem domain. For example, in a binary encoding scheme, each chromosome (Fig. 2.2) is a fixed-length vector, and each gene is a one-bit string (either 0 or 1). At the beginning of the GA evolution, the chromosomes in the population are randomly initialized. CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 12 Figure 2.2: GA chromosome structure 2.1.3 Evaluation and selection • Fitness The fitness function plays an important role in evaluation; it is used to evaluate the quality of each chromosome. The fitness is a maximization function: the bet- ter the quality of the chromosome, the higher the fitness score it has. Depending on the problem domain, different problems use different fitness functions. For example, Manhattan distance and Euclidean distance are widely used in path- finding problems. • Selection Parent selection, also known as elitism, selects the chromosomes with the highest fitness scores. The selected chromosomes in the current generation will be used in the next generation. The common parent selection techniques are tournament, rank, roulette wheel, and stochastic universal sampling. 2.1.4 Evolution Operation • Crossover In GA, the idea of crossover is to select more than one chromosome from the par- ent selection step as parents and swap parts of their genes with certain crossover probabilities to produce one or more offspring. The well-known crossover oper- ators are single-point, two-point, k-point, uniform, order, precedence preserving crossover, partially matched, shuffle, and reduced surrogate. Here is an example (Fig. 2.3) of single-point crossover: Given two selected chromosomes, parent A1 and parent A2, some of their genes are exchanged randomly with crossover prob- ability, and two new chromosomes, offspring A5 and offspring A6, are produced. CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 13 Figure 2.3: GA Crossover • Mutation This concept involves adding random genes to the offspring in a pop- ulation. This is done to ensure diversity in the population from one generation to the next, which helps prevent the population from becoming too similar too quickly. Here (Fig. 2.4) is what happens after mutation. Figure 2.4: GA Mutation 2.2 Genetic Programming 2.2.1 Life-Cycle GP has the same evolution life cycle, which also contains an initial population, fitness function, parent selection, mutation, and crossover. The main difference between GP and GA is the chromosome structure in the initial population. The chromosome struc- ture of GA is a vector of bit strings, which is randomly initialized according to the problem domain. However, the chromosome structure of GP is a randomly generated computer program. The chromosome for GA is raw data, while the chromosome for GP is processed data. CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 14 2.2.2 Chromosome Structure The representation of the GP chromosome structure is expressed as a syntax tree. The figure (Fig. 2.5) shows the GP chromosome structure. The syntax tree is composed of nodes and links. The nodes are if-else logic functions that indicate the instructions to execute, and the links are terminals that indicate the actions for each instruction. The start node for the chromosome is the tree root, and the terminals are at the deepest level of the tree. Figure 2.5: GP Chromosome In Genetic Programming, the chromosome is structured as a tree. This tree com- prises a single root node, along with multiple non-terminal and terminal nodes. The non-terminal nodes in GP serve as functions, which can include arithmetic functions, Boolean functions, and conditional statements. The terminal nodes are either inputs to the GP program or are used for specific processing tasks that are relevant to the problem at hand. 2.2.3 Evaluation and Selection GP utilizes the same techniques for evaluation and selection as GA. The fitness function will evaluate each chromosome and return a fitness score. Chromosomes with higher fitness scores have a higher chance of being chosen. 2.2.4 Evolution Operations GP employs a variety of genetic operators. These typically include: • Crossover: Sub-trees are exchanged between parent programs. CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 15 The crossover (Fig. 2.6) applies to two selected chromosomes to generate two new offspring. During the crossover, a node is randomly picked from both selected chromosomes, and they will swap their picked nodes and their children. • Mutation: Random alterations are made to parts of the tree. Similar to GA, the key idea of the GP mutation (Fig. 2.6) is to avoid premature convergence. The algorithm will randomly generate a new sub-tree and replace the randomly selected node with the new sub-tree. Figure 2.6: GP evolution operations Compared to GA, GP is another kind of GA. The main difference is that the chromosome for GA is a vector of bit strings (raw data), while the chromosome for GP is a tree-type computer program (processed data). GA has some limitations, while the performance of GP is better in solving some types of problems, such as problems where there is no given ideal solution and problems with a stochastic environment where the situation is constantly changing. For the robot navigation problem, the agent needs to find the path and avoid obstacles. The tree-type chromosome with if-else logic nodes is good for decision-making, as the agent can judge the situation and take the correct action. The tree-type chromosome has some advantages when solving decision-making is- sues; however, it still has some limitations. The limitation of GP is the inefficient search for optimal solutions. This inefficiency is primarily due to the ”bloat” of GP. In GP, the term ”bloat” describes the propensity for the solution search space to grow unnecessarily large. The reason for this expansion is that the GP tree structures can become increasingly deep and complicated over time, presenting the algorithm with a vast number of potential solutions to consider. Because of this, finding the best solutions in GP becomes less effective, especially as the tree’s maximum depth grows. CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 16 2.3 Genetic Network Programming As the foundation of our research, Genetic Network Programming (GNP) was intro- duced as a powerful method for solving more complex decision-making problems in a dynamic environment. GNP is another flavor of GA. The chromosome structure of GNP is also a computer program; however, this computer program is a non-terminal directed graph. To better understand how this design relates to our research, let’s delve into the distinct representation and operational mechanisms of GNP as detailed in this paper. 2.3.1 Chromosome Structure GNP adopts a network structure for its chromosomes. This structure is characterized by two primary types of nodes with time delay: judgment nodes and processing nodes. These nodes in GNP are analogous to the non-terminal and terminal nodes in GP, respectively. The network structure in GNP offers a distinctive approach to problem- solving compared to the tree structure used in GP. Figure 2.7: GNP Chromosome Processing nodes are responsible for executing specific tasks or functions, while judgment nodes are used to make decisions based on certain conditions. The time delay is the period from the end of processing in one node to the start in another. During the graph transaction, within a given time frame in a problem domain, judgment nodes decide which node to activate next based on specific conditions, while processing nodes perform certain actions or computations. CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 17 2.3.2 Function Library The function library is a library that contains judgment functions and processing func- tions. The judgment function responds to decision-making, and the processing function responds to actions. When the network transitions to a certain judgment/processing node, the corresponding judgment/processing function will be called to instruct the agent to judge the current situation or perform a certain action. 2.3.3 Chromosome Transaction In general, within a given time frame, the graph transaction starts from the start node, which is usually a judgment node. The judgment node will be called based on the current environment, and the judgment will return the information to determine the next node. The time delay for the current node will be deducted from the time frame, and the transaction now moves to node 2. The transaction will stop when the given time is reached. 2.3.4 Evaluation and Selection GNP use fitness function to evaluate the performance of the chromosome. After the chromosome transaction, the fitness function will assign a fitness score for this chromo- some, the value of the fitness score usually depends on the chromosome performance, the feedback from the environment, and the time usage. GNP can use any kind of fitness equation for different kinds of problems. The chromosome with relative hog fitness score will be picked by the selection function, the same as the GA and GP, GNP use the same parent selection techniques. 2.3.5 Evolution Operator 2.3.6 Evaluation and Selection GNP uses a fitness function to evaluate the performance of the chromosome. After the chromosome transaction, the fitness function will assign a fitness score to this chromosome. The value of the fitness score usually depends on the chromosome’s performance, feedback from the environment, and time usage. GNP can use any kind of fitness equation for different kinds of problems. The chromosome with a relatively CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 18 high fitness score will be picked by the selection function. Similar to GA and GP, GNP uses the same parent selection techniques. 2.3.7 Evolution Operator GNP utilizes operators like mutation and crossover but in the context of its network structure. This includes changing links between nodes or altering the functions of nodes themselves. For instance, one mutation type in GNP involves randomly selecting a point in the network and either altering the connection at that node or replacing the node entirely with a new, randomly generated one. This demonstrates the unique network-based approach of GNP compared to the tree-based structure of GP. • Crossover As shown in Figure (Fig. 2.8), given two graph chromosomes, Parent 1 and Parent 2, they generate two new offspring, Offspring 1 and Offspring 2. Both parents and their offspring have the same amount and type of nodes. In contrast, node 1 and node 9 for Parent 1 and Parent 2 have different connections. In Parent 1, node 1 is connected to node 2, whereas in Parent 2, node 1 is connected to nodes 4 and 6. Parent 1 and Parent 2 only swap the connections to generate their offspring. In GNP, the crossover operator only swaps the connections between two selected chromosomes randomly with a certain crossover probability. • Mutation The mutation in GNP randomly changes the connections for one of the nodes in the selected chromosome, this will generate one new offspring. In Figure (Fig. 2.9) below, to mutate the randomly selected chromosome, one of the nodes in this chromosome will be selected randomly, and its connection will be changed to connect to another node with mutation probability. The mutated individual will be saved for the next generation. CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 19 Figure 2.8: GNP Crossover Figure 2.9: GNP Mutation CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 20 2.4 Literature Review In this section, we will conduct a literature review of the relevant GNP variants, includ- ing the improved GNP and GNP combined with other algorithms. The scope of the review for this chapter emphasizes how these GNP variants work and where they have been enhanced to improve GNP efficiency. This includes changes to the chromosome structure, modified evolution operators, and the combination of GNP with other algo- rithms. Through reviewing the literature, we will learn and determine how we want to improve our GNP variant. 2.4.1 Improved GNP Operators and Structure Although GNP is very powerful compared to GA and GP, it still has some limitations. Recently, there have been significant improvements in GNP, mainly aimed at making it more efficient and effective at solving problems. The concept of ”transition by ne- cessity” in GNP was proposed in this paper [14]. In this context, only some judgment functions (necessary judgments) are made to determine actions, where the process of evolution determines the necessity of these judgments. This means that not all judg- ment functions need to be considered in every scenario, only those deemed necessary through evolutionary processes. The traditional GNP does not consider ”transition by necessity,” which leads to two major drawbacks: invalid evolution and negative evolution. Invalid evolution occurs when the genetic operators modify parts of the graph that are not actively used or necessary, while negative evolution indicates a situation where, as the evolutionary process progresses, the fitness scores of the populations decrease instead of improving. These issues can lead to deteriorated evolutionary efficiency. To resolve these problems, the paper proposes novel evolution strategies with sim- plified genetic operators. For the crossover, the simplified crossover is restricted to the region of transited branches, where it intersects between crossover individual A and crossover individual B. For the mutation, the simplified mutation is restricted to the region of the mutation individual. These new operators are designed to reduce the negative impacts of invalid and negative evolution, reducing unnecessary transactions in GNP and improving the performance of GNP. Apart from ”transition by necessity,” another important feature in GNP is the ”Reusability of nodes,” which plays a vital role in the GNP graph. This paper [16] addresses this issue and introduces a new method to improve GNP performance. CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 21 The ”Reusability of nodes” refers to the frequency of node usage during evolution; the more important the node, the higher its usage. In traditional GNP, all nodes are treated equally, regardless of their contribution to successful outcomes, leading to invalid evolution and negative evolution. To increase the usage of important nodes, the paper introduces adaptive Genetic Network Programming. By identifying and focusing on frequently reused, effective nodes, adaptive GNP can enhance evolution efficiency. Adaptive GNP has customized operators that dynamically adjust crossover and mutation probabilities based on the reusability of nodes within the GNP graph. The reusability is quantified by counting the frequency of node usage, influencing the evolutionary pressure for each variable (branch) of the individual GNP. The adaptation process involves assessing each variable’s contribution to the solution’s fitness, allowing the GNP to self-adjust its evolution strategy. To enhance GNP and make it more efficient, some studies have modified the mech- anism of evolution operators. They have implemented ”transition by necessity” evolu- tion and increased the reusability of important nodes. Other research has focused on altering the chromosome structure itself. In traditional GNP, the judgment nodes, responsible for decision-making, are not fully utilized, and the performance of the evolution highly depends on the usage of these judgment nodes. To address this issue, Genetic Network Programming with Control Nodes (GNPCN) is proposed [17]. The control nodes, similar to start nodes in the original GNP, act as checkpoints, helping nodes transact more efficiently and ensuring all parts of the chromosome are effectively used. GNPCN is achieved by transferring the activated node to a control node after a certain number of processing nodes are activated. This method ensures that node transactions do not just randomly jump between nodes. Instead, it follows a structured path or branch, where a certain amount of processing is done before moving to the next stage (control node). GNPCN structures the chromosome in breadth and depth; the breadth refers to the number of control nodes, and the depth refers to the number of activated processing nodes. GNPCN evolves both the breadth and depth of the chromosome before returning to a control node, meaning more structured paths or branches will be involved during chromosome evolution. CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 22 2.4.2 GNP Combined with Other Algorithms 2.4.2.1 GNP with EDAs Estimation of Distribution Algorithms (EDAs) [18] are a type of evolutionary com- putation. Unlike traditional evolutionary algorithms that rely on random methods to mimic biological evolution for creating new populations, this algorithm utilizes a prob- abilistic model. This model is created using statistical or machine learning techniques to approximate the current population’s probability distribution, and then it is used to generate new populations. The life cycle of EDAs can be summarized in the following steps: Firstly, it creates an initial population. Secondly, it selects some elites. Thirdly, it develops a probabilistic model based on the selected individuals. Fourthly, this model is then utilized to create a new population. Finally, the process returns to the second step and repeats these steps until predefined stopping criteria are met. Evolution operators such as crossover and mutation randomly change and swap genes to evolve the chromosome. However, each GNP gene in the chromosomes is highly dependent on each other, and these operators sometimes might break the useful branch and decrease the evolution efficiency. To overcome this issue, Genetic Network Programming with Estimation of Distribution Algorithms (GNP-EDAs) [19] combines GNP and EDAs, replacing the crossover and mutation with the probabilistic model, which can generate a useful new population by calculating the connection probabilities between different genes. The probabilistic model is constructed in this way: First, consider a method for estimating the likelihood of connections among distinct nodes within a network. Dur- ing the t-th iteration, this approach calculates the probability Pt(i, k, j) that a link emanates from the k-th branch of node i to node j. The formula for this probability is given by the ratio of the sum of connection strengths from node i to j over the sum of all connection strengths from node i to any node. Mathematically, it is represented as: Pt(i, k, j) = ∑ n∈N δnt (i, k, j)∑ j∈A(i,k) ∑ n∈N δnt (i, k, j) Second, both the connection information and transition information between differ- ent genes are considered. In a chromosome, not every node might be involved in the problem-solving process. A node transition is determined by choosing the necessary nodes. CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 23 Thus, in this method, the transition information is factored into the probabilistic model. The probability that a connection from the k-th branch of node i will lead to node j is dependent on how often such a transition occurs. The more frequent the transition, the more valuable the connection is deemed. The formula for this method is given by: Pt(i, k, j) = ∑ n∈N δnt (i, k, j) + ηqnt (i, k, j)∑ j∈A(i,k) ∑ n∈N δnt (i, k, j) + ηqnt (i, k, j) where δnt (i, k, j) represents the connection strength from node i to j at the n-th node in the t-th generation, and qnt (i, k, j) accounts for the transition frequency from node i to j. With the implementation of the probabilistic model, the GNP evolution can gener- ate more useful populations. However, GNP-EDAs have some obvious drawbacks: In EDAs, the distributions of node connections are computed to guide the search process. This leads to the propensity of EDAs to converge on local optima, which means that they can get stuck in what seems like the best solution but is not (local optimum). Another issue is that when calculating these probabilities, the method does not con- sider how good (or fit) each solution is; it treats the best and the tenth-best solutions the same. Additionally, the method counts every connection from good solutions, even if they do not help make decisions, which can confuse the process of finding the best solution. The biggest problem is that it does not handle loops well. When connections in a loop are used too much (the probability distribution is too high), they seem more important than they should be, which can make the method settle on a local optimum too quickly. To address the above issues and maintain chromosome diversity, papers [19] [20] used an estimation of distribution algorithm, as in the previous study, to create the next set of chromosomes. They also used reinforcement learning to build a model based on the better solutions found in the group. A detailed analysis of these studies is available in this paper [21]. The paper presents an extended probabilistic model building a genetic network programming (PMBGNP) algorithm to speed up the GNP evolution. The key idea of PMBGNP is to pick the sub-branches of the underperforming (bad) individuals for building a probabilistic model. This model can improve GNP efficiency by using Sarsa learning to calculate Q values to identify the good structure. When using the probabilistic model with RL, if a task is completed successfully, only the final step that achieves the goal gets a reward. This reward is distributed slowly over the system, which can slow down how quickly it learns and evolves. Moreover, these algorithms CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 24 become less effective when the problems they are trying to solve get more complicated. 2.4.2.2 GNP with Reinforcement Learning Integrating GNP with Reinforcement Learning (RL) represents an important advance- ment in the development of GNP. A significant enhancement was the integration of GNP with Q-learning, marking the first instance of GNP being combined with RL, as introduced by [22]. This was used to refine the transfer rules between states within GNP, and this approach was specifically applied to the prisoner’s dilemma problem. Further studies [23] combined GNP with Q-learning to better suit the environment. In other papers [24] [25], GNP was integrated with the SARSA algorithm to enhance the GNP evolution efficiency, particularly in controlling the Khepera robots process. To make GNP more efficient during evolution, Genetic Network Programming with Reinforcement Learning (GNP-RL) [26] was introduced. GNP-RL combines GNP and RL, where RL can help GNP generate chromosomes with relatively high fitness scores. In the original GNP algorithm, the structure of the graph remains constant during the evolution process and becomes fixed once an optimal individual is chosen during the testing phase. Conversely, integrating GNP with reinforcement learning allows for the alteration of node connections during the testing phase, enhancing the system’s performance in changing environments. As an extension of GNP-RL, this paper [27] attempted to use multiple start nodes to derive multiple programs from a GNP structure in parallel. Chromosome Structure The GNP-RL chromosome structure (Fig. 2.10) contains nodes and sub-nodes. Different from the GNP chromosome structure, each sub-node has a Q-value, which corresponds to each state-action pair. The key idea is to link a specific state to a corresponding action. In this scenario, the ’state’ refers to the node ID, while the ’action’ pertains to one of its sub-nodes. A node will execute only one of its sub-nodes at a given time, and the choice of which sub-node to execute is determined by the reinforcement learning algorithm. In a certain situation, an agent’s possible actions are represented as distinct types of processing nodes within the chromosome. In contrast, the GNP-RL algorithm treats these actions as sub-nodes under a single processing node. The reinforcement learning algorithm will pick one of them to execute. GNP-RL Transaction The figure (Fig. 2.11) illustrates the GNP-RL transaction, which differs from the GNP life cycle in the following way: Once the initial population CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 25 Figure 2.10: GNP-RL chromosome structure is created, each chromosome is evaluated using a fitness function, and Reinforcement Learning trains the agents to determine the most effective sub-node. In the subsequent generation, the new population will be trained and evaluated by the fitness function again before being subjected to elite selection and evolutionary operations. Figure 2.11: GNP-RL transection 2.4.2.3 GNP with Ant Colony Optimization Integration of GNP with swarm intelligence algorithms is another crucial advancement in the development of GNP. Ant Colony Optimization (ACO) was combined with GNP in [28] [29] for the Evaluator group control system. The unique evolution operation and CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 26 pheromone intensity calculation methodology can improve GNP’s exploitation ability. In GNP with ACO, the integrated approach dictates that for every ten GNP iterations, one iteration is specifically allocated to execute ACO. 2.4.2.4 GNP with ACO Life Cycle ACO mimics how ants find food. Ants use a scent called pheromone to mark paths from the colony to food; stronger pheromone trails can attract more ants. As pheromone evaporates over time, less popular routes disappear, helping ants choose the best paths. As the table shows (Fig. 2.12), the similarity between GNP and ACO lies in some common logic. GNP with ACO is proposed [28] to enhance GNP’s searchability. Figure 2.12: GNP vs ACO Figure 2.13: GNP-ACO life-cycle The life cycle (Fig. 2.13) starts with population initialization, where the quantities of different nodes and connections within the chromosome are set randomly. During the pheromone calculation phase, the fitness function is used to evaluate the performance of the chromosomes, and based on this evaluation, along with the transition frequencies, the pheromone levels are calculated with the following equation: CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 27 hnm(i, k, a) = Fn − fn m Fn · γnm(i, k, a), (2.1) After updating the pheromone, the life cycle proceeds to the genetic operation phase. In this phase, the generation is separated into general generation and spe- cial generation. Crossover occurs conventionally in each generation, with pheromone treated as a branch attribute of GNP. Mutations are typical in general generations. However, in special generations, pheromone information is used to create new individu- als instead of mutation. Branches with more pheromone have a higher chance of being passed on to the new GNP individual. Furthermore, the Artificial Bee Colony (ABC) algorithm was proposed in work [30]. The GNP chromosomes were evolved by an ABC-based evolutionary approach. The ABC algorithm consists of three sequential phases: the Employed bee phase, the Onlooker bee phase, and the Scout bee phase. During the Employed and Onlooker bee phases, individuals within the population attempt to disseminate their knowledge and spawn new individuals. A new individual will only replace its predecessor if it proves to be superior. In the Scout bee phase, any individual who fails to demonstrate improvement over a set number of iterations is replaced by a newly generated individual. In other research [28] [29], it has been found that algorithms can struggle with more complex problems. To help Genetic Network Programming (GNP) work better, ACO was used to boost its ability to find good solutions. Yet, they did not eliminate crossover and mutations, which sometimes meant losing parts that worked well. Inspired by GNP with ACO, the ACNP algorithm was proposed in [31] to solve these issues. During evolution, the ACNP creates an experience table according to the structure of the chromosome, and a new population is generated according to the current experience table. After parent selection and evolution operations, a new experience table is created. The graph structure of GNP is optimized in this way. 2.4.3 Multi-goal Path-finding problem with evolution algorithm Path-finding(path planning) and multi-goal problems are critical areas of AI study [32]. Unlike the A* algorithm for solving traditional path-finding issues (path-finding issues with a constrained environment, the evolutionary algorithm has better performance on path-finding for dynamic environments and real-time systems. CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 28 2.4.3.1 Travelling Salesman Problem The Travelling Salesman Problem (TSP) [33] is a classic issue in path-finding. Many researchers use the TSP as a case study for exploring evolutionary algorithms, aiming to find efficient solutions for TSP. [34] uses GA to solve the Unmanned Aerial Vehicle (UAV) problem (finding an optimal solution to a single UAV path is the same as solving a TSP problem). However, the chromosome of GA is a vector of bit strings, and a computer program with an if-else logic type chromosome is more suitable for solving this kind of problem. To address this issue, [35] uses GP to solve the UAV problem. Overall, path-finding is a multi-objective optimization problem and can be defined by the formula below: min f(x) = (f1(x), f2(x), f3(x)) T (2.2) f1(x) = n∑ i=1 l2i , f2(x) = n∑ i=1 h2i , f3(x) = n∑ i=1 fTi . (2.3) Where f1(x) defines the cost function of path distance, f2(x) defines the cost func- tion of flight altitude, and f3(x) defines the evaluation value for all threats. Chromosome Representation The solution is described by a set of feasible func- tions. The chromosome structure is a syntax tree, where each non-terminal node rep- resents a feasible function. There are a total of Nf non-terminal nodes, i.e., F = {f1, f2, . . . , fNf }, and Nterm terminal nodes, i.e., T = {a1, a2, . . . , aNterm}. The purpose of using an evolutionary algorithm to solve the path-finding problem is to find the computer program that can represent the problem’s nature, which means the computer program with the most optimal fitness score. Both the input and output of GP are computer programs. Thus, compared to the chromosome of GA, GP is more suitable for solving this problem. The Generalized Traveling Salesman Problem with Neighborhoods (GTSPN) is an extended problem for TSP. In GTSPN, instead of having discrete cities as in TSP, each cluster consists of a ”neighborhood,” which is a set of regions or points. This paper CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 29 [36] solved this multi-goal path planning problem with a Hybrid Random-Key Genetic Algorithm (HRKGA). • Input: a set V = {1, . . . , n} of the indices of the neighborhood sets Si, the neighborhood sets Si, n sets Wi = {1, . . . ,mi} of the indices of the neighborhoods Qi,k, and the neighborhoods Qi,k. • Output: a minimal-cost cyclic tour T = {qτ(1), . . . , qτ(n)} with objective value L. GA needs to discover the optimal neighborhood within the neighborhood set. In the chromosome encoding, each gene contains an index k, which is an index for the neighborhood within the neighborhood set, and a vector part qi, which is the node configuration within each neighborhood. Both k and qi are randomly generated. Each gene is assigned a random number uniformly and encoded into binary format [0, 1]. In general, each chromosome is a linear equation. For instance, in the case of V = {1, . . . , 4} and Wi = {1, . . . , 5}, the following chromosome: 1q1.22 4q2.72 2q3.45 3q4.02 is decoded into the following route: q4 ∈ Q4,3 → q1 ∈ Q1,1 → q3 ∈ Q3,2 → q2 ∈ Q2,4 GA and GP are widely used with different equations to solve TSP and related problems. As can be observed, for path-finding issues or multi-goal path-finding issues, the optimal solution is an equation or other type of computer program. GP has better performance compared to GA, as a computer program input is more suitable for this. Therefore, the computer program type chromosome should be adapted. Thus, GP and GNP should be used for the path-finding problem and multi-goal path planning problem. 2.4.3.2 Robotic path planing Robotic path planning is the problem where the aim is to find the most efficient path for a robot to travel without any collisions. This involves connecting several target CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 30 points within the operational area of the robot. This particular challenge is known as a multi-goal path planning problem (MTP), and it is a well-researched area within the evolution algorithm field. GA This paper [37] implements path planning using GA on mobile robots. The goal is to let GA help the robot find the shortest collision-free path from the current position of the agent to the destination. In GA, each chromosome represents a possible path, which is stored in waypoints. The path is decoded by measuring equal intervals along a straight trajectory from the starting point to the endpoint. As shown in the figure (Fig. 2.14) below, waypoints are stored as pairs of x and y coordinates, which are integers. However, this design is for a constrained environment, not for a dynamic environment. Figure 2.14: Chromosome structure RTP-GA The paper [38] introduced real-time path-finding with Genetic Algorithm (RTP-GA) to solve the path-finding issue in a changing environment. RTP-GA is GA- based and combines with A*, a heuristic search algorithm, to increase the optimization capabilities and adaptability of GA. RTP-GA Life Cycle The RTP-GA life cycle is shown below (Fig. 2.15). RTP- GA is integrated with the environment. Unlike GA, the fitness evaluation and parent selection occur after the evolution operation. After parent selection, the agent will take a series of actions and move through the environment. During the movement through the A* phase, Greedy A* is employed for navigating the environment until the target location is farther away than the Manhattan distance of the original location. Thus, it is necessary for the navigating entity to regularly update and track its location on the map and denote the areas it has already explored. GP As mentioned in the previous section, for path-finding issues, GP has better performance than GA due to the difference in chromosome structure. This paper (ref) implements GP for multi-robot planning. Each chromosome is a computer program, similar to the chromosome in GA. After running the chromosome, it will return a CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 31 Figure 2.15: RTP-GA life cycle path. The figure shows the design of the chromosome, where the non-terminal nodes represent the judgment for different situations, and the terminal nodes represent the movement of the agent. To implement an evolutionary algorithm for solving path- finding or multi-goal planning problems, the chromosome represents a solution for a given problem domain; it should be a computer program, where the running result is a path or a mathematical equation. 2.5 Conclusion In this section, we have reviewed the state-of-the-art GNP variants. There are several directions to improve GNP, such as improving the chromosome structure and modifying the evolution operators. The efficiency of GNP can also be enhanced by integrating with other algorithms, such as GNP with EDAs, RL, and ACO. These algorithms usually help GNP pick the right nodes or actions for the evolution. To improve the efficiency of our proposed GNP variant, we can modify the chro- mosome structure, such as reducing the number of nodes, or we can use customized crossover and mutation. Furthermore, we can allow other algorithms to help GNP CHAPTER 2. BACKGROUND AND LITERATURE REVIEW 32 navigate the evolution process. Chapter 3 General Strategies In this section, we delve into the unique methodology that sets our Genetic Network Programming (GNP) research apart from existing works in the field. Our approach is characterized by a distinctive ”Top-Bottom” strategy, where the journey begins with the conception of a complex ”black box” solution, which can subsequently guide the evolution towards the most effective GNP individual. On the other hand, we aim to derive a GNP computational graph solution from the ”black box.” The objective of this section is to explain the ”Top-Bottom” approach, how this research utilizes the ”black box” for our novel GNP variant with the ”Taxi problem” case study, and our unique version of the ”Divide and Conquer” strategy, specially made for multi-goal path-finding challenges. 3.1 Related Work The object of this research is to convert or interpret the ”black box” system into a computational graph. The ”black box” system refers to systems that hide their internal logic from the users [39]. The term ”interpretation” refers to providing a clear explanation of concepts. In data mining and machine learning, interpretability is defined as the ability to explain or provide meaning in understandable terms to a human [40]. According to these papers [41] [42], an interpretable model has three characteristics: Interpretability, Accuracy, and Fidelity. Humans can easily understand and interpret 33 CHAPTER 3. GENERAL STRATEGIES 34 these models, which include decision trees, rules, and linear models. A decision tree can be linearized into decision rules using the if-then form [43]: If condition1 ∧ condition2 ∧ condition3, then outcome. This paper [7] categorizes various approaches to explain the ”black box.” In general, the ”black box” involves transforming the complex AI model into a block of ”IF-THEN” logic, which is human-readable. The paper broadly categorizes the approaches into three main types: • Model-specific explanations: This approach focuses on interpreting specific types of models, such as deep neural networks or decision trees. • Model-agnostic explanations: This approach provides insights that can be applied to different types of ”black box” models. • Techniques focusing on specific aspects of interpretability: This approach helps us understand how the input affects the output of the model. The work in [44] provides a comprehensive review of how GP can be leveraged to enhance the interpretability of machine learning models. The focus is on two main categories: The first category is intrinsic interpretability, which seeks to directly evolve more effective (and interpretable) models through GP. The second category is post-hoc interpretability, which uses GP to explain other black-box machine learning models or the models that have evolved from simpler models, like linear models. The XGP review section discusses intrinsic interpretability XGP approaches that take into account model complexity (including size), number of features, meaningful combinations between features, and questionnaire-based interpretability measures. XGP focuses on the following properties to improve interpretability: Smaller GP Model Size: Smaller GP models have better simulatability. To achieve this, GP can be modified to reduce the size of the chromosome, penalize large models during fitness evaluation and parent selection, and utilize customized operators. Lower (Nonsize) GP Model Complexity: This can be categorized into Struc- tural Complexity Reduction and Functional Complexity Control. The structural com- plexity of models directly impacts their interpretability. Simplicity in model structure can improve understanding of feature importance, interactions, and prediction infer- ence. The following methods are used: [45] [46] [47]. To reduce Functional Complexity, the following methods are used: [48] [49]. CHAPTER 3. GENERAL STRATEGIES 35 Fewer Distinct Features in GP Models: The GP model simplifies interpre- tation by using fewer distinct features and concepts. Feature selection and feature construction can be used to address this problem. Interpretable GP (Sub-)Model Structures: Existing strategies fall into three categories: interpretable function nodes, interpretable combinations between nodes, and problem decomposition. The post-hoc interpretability methods focus on explaining the ”black-box” models. These methods are divided into two main categories: global and local interpretability. Global interpretability methods use GP to generate interpretable models that explain the overall behavior of the black-box model across all instances. Local interpretability, on the other hand, seeks to explain the behavior of the black-box model for specific instances, resulting in explanations that focus on the immediate context surrounding a given data point or decision. In our research, the ”black box” will be converted to a GNP computer graph rather than an ”IF-THEN” logic system or using GP. The GNP graph is more elegant in terms of reusability, modularity, and scalability: • The chromosome structure of the GNP is a network. The GNP does not experi- ence ”bloat” as the problem’s complexity increases. • The GNP chromosome has better interpretable function nodes and interpretable combinations between nodes. The GNP comes with a function library, which consists of judgment functions and processing functions, achieving better inter- pretability in this way. To enhance the GNP interpretation further, this research will utilize the ”Top- Bottom” approach with the GNP to generate the GNP computer graph, and we will use the ”divide and conquer” approach to decompose the complex problem into multiple sub-problems. 3.2 Top-Bottom Approach 3.2.1 ”Black Box” Conception In this context, the complex solution (or target model) serves as a ”black box”; there- fore, we do not examine its computational components, which could include millions of CHAPTER 3. GENERAL STRATEGIES 36 neurons and billions of connection weights. We can only access the inputs and outputs of the ”black box” system, nothing more. The target model knows the correct action for any given scenario but does not provide any reasoning for its actions. This solu- tion is often embodied by advanced algorithms like neural networks, known for their exceptional ability to handle complex and nonlinear problems. Although they are not completely transparent or easily interpretable, they were chosen for their ability to guide us to a highly reusable and interpretable computational graph. 3.2.2 Bridging ”Black Box” and Computational Graphs The ”Top-Bottom” approach is the bridge in this research that connects the ”black box” with computational graphs. The cornerstone of this methodology lies in trans- forming the conceptual ”black box,” often represented by complex algorithms, into a format that aligns with the reusable, interpretable nature of computational graphs. This transformation involves a series of steps: • Abstraction and Decomposition Initially, the complex problem is abstracted and decomposed into fundamental components or sub-problems. This process involves identifying key operations, decision-making processes, and data flow within the solution. For instance, in the multi-goal path-finding problem, the decision-making for different cases and the related actions will be identified, such as when the agent should move and the transitions between them (what action the agent will perform when encountering certain scenarios). This involves careful design to maintain the integrity of the solution’s logic and performance. The primary challenge is to ensure that the computational graph remains functionally equivalent to the original solution. • Mapping to GNP representation The GNP solution will be designed based on the ”black box” system. This critical step converts the complex solution into a visual and structured format. Once the action and judgment parts in the problem domain are defined, the judgment and processing nodes will be created, and the chromosome for GNP will be designed accordingly. During the parent selection process, the fitness function will utilize the black box system to help the GNP evolve. Additionally, the crossover and mutation will swap and alter specific genes to guide the evolution in the right direction. • Iterative Refinement The computational graph is then iteratively refined to CHAPTER 3. GENERAL STRATEGIES 37 ensure that it accurately represents the complex solution while also benefiting from GNP’s evolutionary processes. The evolved solution may contain duplicate or unused branches. To create the perfect solution, our proposed method will be used to further refine the generated solution. 3.2.3 Benefits in GNP Research The top-bottom approach in Genetic Network Programming (GNP) offers a novel per- spective on solving complex problems. While it presents several advantages: • Guided Evolutionary Process Starting with a conception of the ideal solu- tion provides a clear target for the evolutionary process. It helps in directing the evolution of GNP individuals more purposefully towards this target, poten- tially reducing the time and resources spent on aimless exploration. This focused direction can lead to more efficient convergence towards effective solutions. • Benchmarking Excellence Having a defined ”black box” solution serves as a benchmark for evaluating the progress and performance of evolving solutions. It allows researchers to measure how close each generation of GNP individuals gets to this ideal solution. The GNP design will also benefit from this, such as customized chromosomes and evolution operators, which can help the evolution avoid randomly generated bad genes. • Adaptability to Complex Problems Complex problems often require complex solutions. The black box model solution enhances the adaptability of GNP, allowing it to handle more challenging scenar- ios within dynamic environments. This adaptability is crucial in fields where problems are not just complex but also continuously changing. 3.3 Sources of Ground Truth for the Problem Here are the two possible approaches for obtaining the optimal GNP solution, or the ground truth for the problem. The key idea is to use the ground truth to guide the algorithm for transforming a string of judgment/processing node executions into a GNP solution. CHAPTER 3. GENERAL STRATEGIES 38 3.3.1 Approach 1: Handcrafted Solution Dataset The first approach starts with a handcrafted solution. This handcrafted solution re- ceives the current state of the environment to determine the best series of actions. The steps are described as follows. • Input State Processing: The state of the current environment is input into the handcrafted solution. • Node Execution: Feed inputs to the handcrafted solution to get the best action. Based on this input, a sequence of nodes will be executed by the handcrafted solution. These executed nodes are the correct actions for the current states. After running the handcrafted solution, a list of executed nodes will be generated. • Generating a Dataset: As the handcrafted solution runs against 400 different environmental scenarios in the training phase, it creates a 400-node execution list. These lists can comprise a rich dataset. • Pattern Recognition and Duplication Handling: Within this dataset, it can be observed that some sequences of node executions are repeated with certain patterns. By identifying unique partial solutions and applying specific techniques, these patterns can be analyzed and distilled. • Deriving the Perfect GNP Solution: By identifying patterns and unique sequences, we can refine the dataset using our proposed algorithm and formulate a ’perfect’ GNP solution. 3.3.2 Approach 2: GNP Solution Dataset • Run our GNP Across Many Scenarios: The GNP system is put to the test across 400 different cases. This is similar to the first approach where a handcrafted solution was tested. • Record Node Activation Sequences: For each current state of the environ- ment, the GNP executes a series of nodes. For example, given input A, it might run through nodes J1, J3, J4, and then P2. These sequences are the responses of the GNP solution, and they represent partial solutions to that particular envi- ronment. • Compile a Dataset: The sequences of executed nodes in response to each environment are appended to the dataset. This dataset is a combination of the GNP’s decisions and actions. CHAPTER 3. GENERAL STRATEGIES 39 • Analyze for Patterns: Similar to the first approach, the pattern of executed nodes will be identified. We will use our proposed algorithm or approach to convert the string of judgment/processing node executions into an optimal GNP solution. 3.4 Divide and Conquer 3.4.1 Introduction to the Customized Strategy The Divide and Conquer strategy, a timeless algorithmic approach, has been uniquely adapted in our research to address the complexities of multi-goal pathfinding problems within the GNP framework. This adaptation plays a crucial role in complementing the top-bottom approach, offering a structured method to break down complex, multi-goal problems into more manageable sub-problems. 3.4.2 Complementing the Top-Bottom Approach Alignment with Ideal Solutions: The Divide and Conquer strategy aligns seam- lessly with the top-bottom approach, where the ’perfect’ solution is conceptualized first. 3.4.3 Implementation in GNP Decomposition of Multi-Goal Challenges: In the context of GNP, the Divide and Conquer strategy breaks down the ideal solution into smaller, more understandable chunks, each representing a sub-goal in problem-solving. In the taxi problem, this strategy divides the ideal path into sections, each focusing on a specific sub-goal, such as reaching a passenger or getting to a specific drop-off location. The Divide and Conquer strategy is essential for simplifying and managing complex tasks such as the taxi problem. Without the Divide and Conquer approach, every decision impacts the entire task, as each action must be evaluated not only on its immediate merits but also on how it affects the overall process. For example, in the taxi problem, the action of dropping off a passenger is linked to the preceding action of picking them up; if the taxi does not pick up the passenger first, it cannot drop them off. CHAPTER 3. GENERAL STRATEGIES 40 The Divide and Conquer strategy, on the other hand, breaks down a large problem into smaller, separate parts, significantly reducing the problem’s complexity. This separation enables simpler, more focused strategies for each component. Furthermore, it provides a clear framework for troubleshooting and refining strategies, as issues can be isolated and addressed within their specific sub-task. It also aids in quickly resolving problems in one part without disrupting the others. Evolution of Specialized Sub-Solutions: For each sub-goal, specialized GNP individuals are evolved. These individuals are optimized to solve specific aspects of the path-finding problem, ensuring efficiency and effectiveness in each GNP evolution. In the previous section, we discussed several methods for increasing the efficiency of GNP. The first method involves modifying the chromosome to reduce the number of nodes, thereby reducing computational complexity. The second method involves refining the evolution operators, which facilitates convergence. Integrating GNP with the Divide and Conquer strategy complements these gains. Using Divide and Conquer allows the chromosome to eliminate unnecessary nodes. This means that GNP evolution only involves essential nodes for each sub-goal, and customized evolution operators will only introduce the appropriate genes, making the evolution process more streamlined and effective for complex problems. Integration of Sub-Solutions: Once optimal or near-optimal solutions are evolved for each sub-goal, the partial solutions will be combined to form a comprehensive so- lution for the entire multi-goal path-finding problem. However, this combined solution is not perfect and may include redundant parts. To address this problem, the gener- ated solution will be refined based on the ”black box” solution. By constantly refining against an ideal model, we improve the solution’s applicability to complex path-finding challenges. This process ensures that the final solution is effective for each sub-goal and optimal overall. 3.5 Hand-crafting a GNP Solution As a stand-in for the perfect solution, we have hand-crafted a solution in the form of a computational graph. This conveniently provides us with the ground truth for all possible scenarios and allows us to verify the conciseness of our derived graph solution. However, during the transformational process, the target model is only used as a fitness function to guide the creation of a computational graph. CHAPTER 3. GENERAL STRATEGIES 41 3.5.1 Design Principle The design is to let the agent rely on continuous feedback from the environment to inform its decisions and actions. The design reflects a cycle of perception and action, which is a fundamental principle in robotics and artificial intelligence. This model underscores the ”black box” solution design for the intelligent taxi, where it constantly updates its state based on its past and current positions and uses a combination of sensory data and pre-computed knowledge (like heuristic distances) to navigate towards a goal. 3.5.2 Decision-Making Process Outline: The design principle is shown in Figure 3.1: A looped arrow with the phrase ”Sense, decide, act,” represents the continuous cycle of the agent’s process to make decisions. Below this, a three-step sequence is detailed: Figure 3.1: The continuous decision-making cycle of an agent in the Taxi Problem. • Sense: The agent senses the environment to collect important information. This is likely the step where the agent acquires data from its surroundings, which could be the immediate grid squares it can move to. • Decide: The agent utilizes knowledge about the environment. It may combine sensed information with estimations (like heuristic distance from the target) to make informed decisions. This implies a level of reasoning or computation, such as path planning or avoiding obstacles. CHAPTER 3. GENERAL STRATEGIES 42 • Act: The agent takes action, such as moving robot wheels or, in this context, moving to another grid position. In the Taxi problem, the objective is to have the taxi pick up a passenger and then drop them off at a specified destination. This multi-goal path-finding challenge involves treating the location of the passenger as the initial destination and the drop-off point as the final destination. To navigate the taxi toward each destination, the position of the taxi is compared with that of the current destination to determine what action to take. In this context, a three-step sequence is detailed: • Sense: The taxi will collect the current state of the environment, such as the position of the taxi and the goal. • Decide: The taxi will utilize the knowledge of the environment, especially com- paring the position of the taxi and the destination. For example, if the destination is to the southeast of the taxi, the taxi should move south and east. Additionally, the taxi should also judge whether it can move south or east. If there is a wall in front of the taxi to the south or east, the taxi should choose another direction to avoid the wall. • Act: Based on the decision, the taxi will move in a certain direction, pick up, or drop off the passenger. As observed, the key logic of the design is the IF-ELSE-THEN logic. We can use judgment functions to perform the IF-ELSE feature and processing nodes to perform THEN logic. For instance, if the passenger is north of the taxi, the judgment function will assess the positions of the taxi and passenger and make the decision, while the processing function will move the taxi. Based on this, the following judgment and processing functions are designed as follows: • J1 - Should the taxi pick up or drop off the passenger? • J2 (Drop off branch) - Compares the terminal/taxi location and determines the appropriate direction for the taxi or if it should drop off the passenger when they are at the same location. • J3 (Pick up branch) - Compares the passenger/taxi location and decides the appropriate direction for the taxi or if it should pick up the passenger when they are at the same location. CHAPTER 3. GENERAL STRATEGIES 43 • J4 (Should Move East Branch) - Decides whether the taxi should move east. • J5 (Avoid the obstacle to the East) - Decides whether the taxi should move up or down if there’s a wall to the east of the taxi. • J6 (Should Move West Branch) - Decides whether the taxi should move west. • J7 (Avoid the obstacle to the West) - Decides whether the taxi should move up or down if there’s a wall to the west of the taxi. • P1 - Directs the taxi to move north. • P2 - Directs the taxi to move south. • P3 - Directs the taxi to move east. • P4 - Directs the taxi to move west. • P5 - Instructs the taxi to pick up the passenger. • P6 - Instructs the taxi to drop off the passenger. Each function is a void type function. When a certain function is called, that function will perform a judgment or processing feature and return a string, which is the next function to be called. For example, when function P1 is called, the taxi will move north, and P1 will return to J1. J1 will be called to judge whether the taxi should pick up or drop off the passenger. If the taxi still needs to pick up the passenger, J1 will return ”J3,” etc. The details of these functions will be explained in a later chapter. 3.5.3 Graph representation We will get a computer program if we connect all the judgment and processing functions. To visualize this computer program, each node will represent a function, and the whole computer program will be a directed graph. As seen in the image below (see Figure 3.2), the judgment nodes represent nodes with judgment functions, the processing nodes represent nodes with processing functions, and the connections represent how these nodes relate to each other. The graph begins at node J1, which judges whether the taxi should pick up or drop off the passenger. If the taxi should pick up the passenger, the graph goes to the J3 branch; otherwise, it goes to the J2 branch. After this, the program will compare the position of the taxi and the passenger. If they are at the same position, processing CHAPTER 3. GENERAL STRATEGIES 44 node P5 will be executed to pick up the passenger. If the passenger is east of the taxi, judgment nodes J4 and J5 will be executed to judge whether the taxi should move east and if there is a wall in front of the taxi. If the path is clear, processing node P3 will be executed to move the taxi east. All the processing nodes point to J1, which is the start node of the graph. Figure 3.2: The visual representation of a computer program as a directed graph. 3.5.4 Implementation Since we have all the judgment/processing functions and the complete graph, we need to implement the computer program to execute the complete graph to solve the taxi issue. To simulate the GNP graph transaction, the implementation is divided into two parts: the function library and the computer program. Function Library The function library is simply a library that contains all the judgment and process- ing functions. It functions more like a search engine: when the graph needs a certain function, the function library will find its directory according to the function name, and this found function will be called. The function library (see Figure 5.2) is a key-value pair data structure, using an integer number as the key and the function name as the value. When the key is found in the function library, the corresponding function will be called. Computer program CHAPTER 3. GENERAL STRATEGIES 45 Figure 3.3: Function library. The computer program is just a loop which can execute the judgement and process- ing function iteratively. As we can see from the Pseudo code below (Algorithm 1), the implementation is a while loop. It will keep calling judgment and processing functions until the taxi has dropped the passenger successfully. The current is a pointer, which points to the current executed node. With the given total time frame, each processing and judgement function will take a certain time to execute, the loop will stop when it reaches the given time frame. Algorithm 1 Hand-crafted Solution 1: current← J1 2: while not reached the given time frame do 3: if taxi dropped the passenger then 4: break 5: end if 6: current function← functionsLibrary[current] 7: returned value← current function() 8: time← time+ used time 9: current← returned value 10: end while Algorithm walk through Here is a snapshot (Fig. 3.4) of utilizing Algorithm 1 to pick up the passenger successfully. On the left-hand side of the snapshot is the taxi map; the yellow object represents the taxi, and the blue letter represents the pick-up location. On the right- hand side of the snapshot is the output of Algorithm 1. J1 (Should a taxi pick up or drop off the passenger) will be executed in the first step, and the related judgment function will be called (referring to step 6 in Algorithm 1). A value is returned, which is the next node to be executed. In this case, J3 (Pick up branch) is returned, which CHAPTER 3. GENERAL STRATEGIES 46 Figure 3.4: Function library. means the taxi should pick up the passenger. The current node is set to J3, and the judgment function J3 will be called. The function will take the current state of the map as the input and return the next node to be executed. Therefore, J4 (Should Move East Branch) will be executed. As we can see from the map, the pick-up location is east of the taxi, and there is no obstacle. P2 (move taxi to the East) will be executed. After certain steps, the taxi will pick up and drop off the passenger. (The details of the judgment/processing functions will be explained in the proposed GNP Chapter.) In this chapter, we established a handcrafted solution as the ’black box’ solution for our ’Top-Bottom’ approach. As the ’Top’ part, it adapts the fundamental principles in robotics and artificial intelligence, iteratively receiving information from the environ- ment and performing a series of actions. The handcrafted solution is structured around different judgment and processing functions, each tailored to specific scenarios within the Taxi Problem. As the ground truth of the ’Top-Bottom’ approach, this research utilizes this handcrafted solution as the ’black box’ solution to develop our GNP variant and the proposed algorithm in later sections. Chapter 4 Using a Black Box as a Fitness Function in a GNP 4.1 Introduction This section explores our proposed new variant of GNP in detail. It utilizes the ’Divide and Conquer’ strategy to solve the Taxi problem, as well as a black box system to guide the GNP. This GNP is built based on the design of our hand-crafted solution. The focus is on several key aspects: • GNP Life-Cycle: We will explain the details of the evolutionary cycle and how the ’black box’ solution interacts with our GNP variant. • Chromosome Structure: Based on the ’black box’ solution, the problem is separated into ”pick-up” and ”drop-off” sub-problems. These two sub-problems will be explained in both genotype and phenotype. • Customized Evolution Operators: These are specially crafted based on the ’black box’ solution to guide the evolutionary process effectively toward the ideal solution, ensuring meaningful progress in each evolution step. • Innovative Fitness Function: In this fitness function, we combine the fitness function with other algorithms and utilize the ’black box’ solution to guide and evaluate each individual. 47 CHAPTER 4. USING A BLACK BOX AS A FITNESS FUNCTION IN A GNP 48 We will delve into each of these parts and show how they work together to make our new GNP variant solve the multi-goal path-finding issues ef