Ezra MacDonald

BSc (Vancouver Island University, 2020)

Notice of the Final Oral Examination for the Degree of Master of Science

Topic

Scalable Vision Transformers for Remote Sensing Semantic Segmentation

Department of Computer Science

Date & location

Friday, August 23, 2024
9:00 A.M.
Virtual Defence

Reviewers

Supervisory Committee

Dr. Yvonne Coady, Department of Computer Science, University of Victoria (Supervisor)
Dr. Sean Chester, Department of Computer Science, UVic (Member)

External Examiner

Dr. Ryan Rad, Khoury College of Computer Sciences, Northeastern University

Chair of Oral Examination

Dr. Tetjana Ross, School of Earth and Ocean Sciences, UVic

Abstract

Assessing and monitoring environmental landscapes plays a critical role in preserving the environment and ensuring the well-being of communities around the world. The launch of low-orbit earth observation satellites has dramatically increased the availability and resolution of remote sensing data, enabling more precise and frequent monitoring of environmental changes and human impacts across diverse ecosystems. Traditional manual methods of analyzing this data to measure environmental properties are being improved by deep learning techniques, which can uncover complex patterns within the data. Recently, the Transformer architecture has been extended to computer vision, further enhancing the versatility and scalability of deep learning models.

This thesis investigates the application of the Transformer architecture to semantic segmentation using medium-resolution satellite data. It explores the unique properties of remote sensing data and proposes techniques to improve deep learning model architectures and training methodologies for optimized results. Two contributions are presented: MineSegSAT and VistaFormer.

MineSegSAT is designed to identify and monitor environmentally impacted areas of mineral extraction sites using Sentinel-2 imagery. It incorporates state-of-the-art deep learning models and loss functions to automate the detection of disturbed areas, aiding in environmental compliance monitoring.

VistaFormer is introduced as a lightweight and efficient model for the semantic segmentation of satellite image time series (SITS) data. It features an encoder-decoder architecture with gated convolutions and self-attention Transformers in the encoder, paired with a lightweight convolution decoder. This model is designed to handle noise from atmospheric distortions and cloud cover while maintaining high performance and efficiency.

The experimental results demonstrate that VistaFormer outperforms state-of-the-art models on time series crop-type semantic segmentation benchmarks, using fewer floating point operations and fewer trainable parameters. The findings suggest that Transformer-based architectures can significantly enhance the accuracy and efficiency of satellite imagery analysis, providing valuable tools for environmental and agricultural monitoring.

Back to oral exams