SPEAKER IDENTIFICATION FROM VOICE SIGNALS USING HYBRID NEURAL NETWORK

BHATT, HARSHIT

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/18865

Full metadata record

DC Field	Value	Language
dc.contributor.author	BHATT, HARSHIT	-
dc.date.accessioned	2022-02-21T08:36:49Z	-
dc.date.available	2022-02-21T08:36:49Z	-
dc.date.issued	2021-10	-
dc.identifier.uri	http://dspace.dtu.ac.in:8080/jspui/handle/repository/18865	-
dc.description.abstract	Identifying the speaker in audio visual environment is a crucial task which is now surfacing in the research domain researchers nowadays are moving towards utilizing deep neural networks to match people with their respective voices the applications of deep learning are many-fold that include the ability to process huge volume of data robust training of algorithms feasibility of optimization and reduced computation time. Previous studies have explored recurrent and convolutional neural network incorporating GRUs, Bi-GRUs, LSTM, Bi-LSTM and many more[1]. This work proposes a hybrid mechanism which consist of an CNN and LSTM network fused using an early fusion method. We accumulated a dataset of 1,330 voices by recording through a python script of length of 3 seconds in .wav format. The dataset consists of 14 categories and we used 80% for training and 20% for testing. We optimized and fine-tuned the neural networks and modified them to yield optimum results. For the early fusion approach, we used the concatenation operation that fuses neural networks prior to the training phase. The proposed method achieves 97.72% accuracy on our dataset and outperforms all existing baseline mechanisms like MLP, LSTM, CNN, and RNN. This research serves as a contribution to the ongoing research in speaker identification domain and paves way to future directions using deep learning.	en_US
dc.language.iso	en	en_US
dc.publisher	DELHI TECHNOLOGICAL UNIVERSITY	en_US
dc.relation.ispartofseries	TD - 5413;	-
dc.subject	SPEAKER IDENTIFICATION	en_US
dc.subject	VOICE SIGNALS	en_US
dc.subject	HYBRID NEURAL NETWORK	en_US
dc.subject	CNN AND LSTM NETWORKS	en_US
dc.title	SPEAKER IDENTIFICATION FROM VOICE SIGNALS USING HYBRID NEURAL NETWORK	en_US
dc.type	Thesis	en_US
Appears in Collections:	M.E./M.Tech. Information Technology

Files in This Item:

File	Description	Size	Format
thesis Final.pdf		1.08 MB	Adobe PDF	View/Open

Show simple item record