ENVIRONMENTAL AUDIO  CLASSIFICATION USING ATTENTION BASED DEEP LEARNING MODEL

KUMAR, YASH

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21777

Title:	ENVIRONMENTAL AUDIO CLASSIFICATION USING ATTENTION BASED DEEP LEARNING MODEL
Authors:	KUMAR, YASH
Keywords:	ENVIRONMENTAL AUDIO CLASSIFICATION DEEP LEARNING MODEL ATTENTION CNN
Issue Date:	May-2025
Series/Report no.:	TD-7981;
Abstract:	Environmental audio classification involves the identification and categorization of real-world sound events based on their acoustic characteristics. Unlike digital classification tasks that rely on static features, audio classification requires models capable of capturing both temporal and spectral dynamics of non-stationary signals. This study presents a deep learning-based approach that utilizes time-frequency representations of environmental audio, specifically Mel-spectrograms, as input to a deep learning architecture enhanced with attention mechanisms. The raw audio data underwent pre-processing steps including resampling and conversion into Mel-spectrograms to extract meaningful time-frequency features suitable for model training. These mechanisms allow the model to selectively focus on relevant temporal features, improving its ability to differentiate between overlapping or acoustically similar events. Comparative evaluation with traditional convolutional neural networks (1D CNNs) highlights the advantages of attention-based architectures in modelling long-range dependencies and capturing richer contextual information from audio sequences. The audio pre-processing pipeline, model design, and evaluation procedures are implemented using Python-based libraries and advanced deep learning frameworks. The proposed system demonstrates robustness in classifying a variety of sound types, showing potential for deployment in real-time monitoring applications such as smart surveillance, public safety systems, and ambient sound analysis in urban environments. Experimental results show that attention-based models offer improved classification performance and adaptability compared to conventional architectures, making them well-suited for complex acoustic environments. This work contributes to the growing field of intelligent acoustic sensing by offering a flexible and efficient model architecture that adapts to complex audio patterns.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/21777
Appears in Collections:	M.E./M.Tech. Electrical Engineering

Files in This Item:

File	Description	Size	Format
Yash Kumar M.Tech.pdf		1.26 MB	Adobe PDF	View/Open

Show full item record