Skip to main content

Jay Chung

  • BEng (Korea University, 2019)

  • MEng (Korea University, 2021)

Notice of the Final Oral Examination for the Degree of Master of Applied Science

Topic

“World Model based Multi-agent Proximal Policy Optimization Framework for Multi-agent Pathfinding”

Department of Mechanical Engineering

Date & location

  • Wednesday, September 4, 2024

  • 11:00 A.M.

  • Virtual Defence

Reviewers

Supervisory Committee

  • Dr. Homayoun Najjaran, Department of Mechanical Engineering, University of Victoria (Supervisor)

  • Dr. Hong-Chuan Yang, Department of Electrical and Computer Engineering, UVic (Non-Member)

External Examiner

  • Dr. Brandon Haworth, Department of Computer Science, University of Victoria  

Chair of Oral Examination

  • Dr. Jie Zhang, School of Business, UVic

     

Abstract

Multi-agent pathfinding plays a crucial role in various robot applications. Recently, deep reinforcement learning methods have been adopted to solve large-scale planning problems in a decentralized manner. Nonetheless, such approaches pose challenges such as non-stationarity and partial observability. This thesis addresses these challenges by introducing a centralized communication block into a multi-agent proximal policy optimization framework. The evaluation is conducted in a simulation-based environment, featuring continuous state and action spaces. The simulator consists of a vectorized 2D physics engine where agents are bound by the laws of physics.

Within the framework, a World model is utilized to extract and abstract representation features from the global map, leveraging the global context to enhance the training process. This approach involves decoupling the feature extractor from the agent training process, enabling a more accurate representation of the global state that remains unbiased by the actions of the agents. Furthermore, the modularized approach offers the flexibility to replace the representation model with another model or modify tasks within the global map without the retraining of the agents.

The empirical study demonstrates the effectiveness of the proposed approach by comparing three proximal policy optimization-based multi-agent pathfinding frame works. The results indicate that utilizing an autoencoder-based state representation model as the centralized communication model sufficiently provides the global context. Additionally, introducing centralized communication block improves performance and the generalization capability of agent policies.