Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21733
Title: ADVANCED DEEP LEARNING METHODOLOGIES FOR MULTI-LABEL SATELLITE IMAGE CLASSIFICATION
Authors: BARMAN, TAMAL
Keywords: ADVANCED DEEP LEARNING
METHODOLOGIES
MULTI-LABEL SATELLITE
IMAGE CLASSIFICATION
Issue Date: Jun-2025
Series/Report no.: TD-7965;
Abstract: Multi-label satellite image classification presents significant challenges in remote sens ing applications, as aerial scenes frequently contain multiple concurrent elements such as ”partly cloudy,” ”agriculture,” and ”roads.” The complexity increases due to ambiguous training data that often leads to overfitted models in deep learning approaches. Ad dressing these challenges, we propose a comprehensive framework that integrates both convolutional neural networks (CNNs) and transformer-based architectures to achieve op timal classification accuracy while enabling efficient deployment on resource-constrained devices. Our methodology implements a dual-approach strategy, thoroughly evaluating both paradigms on multi-label remote sensing datasets. The first component of our framework utilizes the lightweight MobileNetV2 architec ture pre-trained on millions of ImageNet images, implementing transfer learning tech niques for multi-label classification. We incorporate an effective preprocessing pipeline featuring haze removal algorithms to enhance image quality prior to classification. Dur ing training, we employ one-hot encoding for the multiple class labels associated with each satellite image, while dynamically adjusting the threshold for posterior class probabilities at the network output to optimize prediction accuracy. This approach balances compu tational efficiency with classification performance, making it suitable for deployment in environments with limited resources. Concurrently, we investigate Vision Transformers (ViTs) as an alternative paradigm, leveraging their unique ability to capture long-range dependencies across image regions. Unlike CNNs that extract features through convolutional layers, ViTs divide images into patches and process them as token sequences, similar to language processing techniques. This fundamental architectural difference enables ViTs to capture broader contextual information and more detailed features across the entire image-a critical advantage when dealing with multi-label satellite imagery containing diverse categories, sizes, and spatial arrangements. We comprehensively evaluate six lightweight ViT variants: ViT-Small, iv ViT-Tiny, ViT-Base, Swin-Tiny, DeiT-Tiny, and DeiT-Base, optimizing each model for the multi-label classification task. Our findings demonstrate that carefully optimized lightweight models can achieve performance comparable to or exceeding more complex architectures while requiring sub stantially fewer computational resources. This has important implications for real-time satellite image analysis, environmental monitoring, agricultural assessment, and disaster response applications where deployment on edge devices with limited processing capabilities is necessary. The complementary strengths of CNN and transformer-based approaches suggest that hybrid architectures combining aspects of both paradigms may represent a promising direction for future research in multi-label satellite image classification.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21733
Appears in Collections:M.E./M.Tech. Information Technology

Files in This Item:
File Description SizeFormat 
Tamal_BarmaN MASTER OF TECHNOLOGY IN IT.pdf4.66 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.