Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21219
Title: DESIGN AND ANALYSIS OF VIDEO SUMMARIZATION APPROACHES USING ARTIFICIAL INTELLIGENCE TECHNIQUES
Authors: GOEL, RUCHI
Keywords: VIDEO SUMMARIZATION APPROACHES
ARTIFICIAL INTELLIGENCE TECHNIQUES
AI APPROACHES
TC-CLSTM
MFCC
Issue Date: Nov-2024
Series/Report no.: TD-7559;
Abstract: The enormous increase of digital video data in today’s fast-paced digital landscape has highlighted the importance of rapid and efficient data analysis and video retrieval ap proaches. Choosing a worthwhile video is not possible due to the vast amount of con tent. Time constraints also make it impossible to watch all videos through to the end. The process of condensing a video while maintaining its essential ideas and meaning using important keyframes, content and scenes is known as video summarization. The goal is to save time and effort by giving a succinct synopsis of the most crucial parts of the video rather than having to watch the whole video. Video summarization finds ap plications in various fields, such as surveillance, education, entertainment, and content browsing, enhancing accessibility and efficiency in video consumption. The expansion in video content by the enhancement in multimedia technologies ne cessitates the development of novel browsing and understanding strategies. Design and Analysis of Video Summarization approaches using Artificial Intelligence Techniques is a thorough investigation that focuses on the creation and assessment of techniques for automatically producing succinct and informative video summaries. ”Video sum marization” is the process of distilling long video footage into shorter versions while keeping the most important details and pertinent information by removing redundant frames. ”Artificial intelligence (AI) techniques” refer to a wide range of approaches and algorithms that make it possible for computers to carry out tasks that would typ ically require human intelligence. AI approaches are applied in the context of video summarization to automatically construct meaningful summaries, discover key aspects, and evaluate and comprehend the content of videos. This research work contributes in providing video summary that makes video con tent easier to access by providing a more user-friendly entry point to complex video content. Multiple strategies are investigated to meet the various requirements of video summarization, such as key frame extraction and multimodal techniques. This helps a larger audience that has time or resources. Firstly, efficiently discovering and navigating through a large number of videos is a big problem in smart cities, where security cameras add to the volume of video data and make efficient indexing and retrieval systems necessary. Video summarization appears as a critical solution that allows large-scale video collections to be stored, retrieved, and browsed while maintaining important aspects. With an emphasis on real-time applica iv tions, this research work offers a thorough overview of video summarizing approaches. After that analysis of real-time video is done using subtitles. Text summarizing tech niques LSA and TextRank are used to assess the retrieved subtitles. The analyzed text is used to create a summary. This could be a brief text snippet emphasizing the impor tant points, a list of key subjects presented, or even keywords that describe the video’s content. The created summary is displayed alongside the video stream, allowing the user to follow along and understand the gist of the information as it unfolds. Secondly, in a video, each frame represents a single point in time. However, several frames may include redundant information or slight differences. Keyframe extraction seeks to select a subset of these frames that best represent the visual content through out the video. A method called TC-CLSTM Auto Encoder with mode-based learning is proposed for automatically selecting the keyframe, The autoencoder learns to recog nize the most relevant elements in a video frame. These attributes can then be utilized to choose keyframes. Frames are rated based on extracted features and mode values, and the top-ranking frames are selected as keyframes. Thirdly, VEM a hybrid model is pro posed for video summarization. this multimodal video summarization model presents a video summary using different multimedia modalities like text, audio and frames. To tackle text aspect subtitles are used. For audio component files are obtained in .wav for mat and from audio chunks From these audio chunks MFCC (Mel-frequency cepstral coefficients), Mel Spectrum, area Under the audio Curve, and audio peak after average cut-off were obtained and for the third aspect Mean Absolute Difference (MAD) is used to find important frame. Combining all aspects final summary is obtained. Another method TAVM using multimodal summarization is also presented. The BEiT vision transformer is used to identify items within the selected frame. For audio processing, speech-to-text converters are used to transcribe the audio content. Finally, in the final stage, the Summary Builder uses the GPT-3-based OpenAI API to build a summary of the information. Lastly, Artificial intelligence (AI)-driven methods incorporating human presence detection and face identification lead to automatic summarization utilizing text and au dio cues. The suggested framework intends to improve the efficacy and efficiency of video summarization by synthesizing various approaches, enabling quick understanding and retrieval of pertinent content amid the torrent of video data in the digital world. Video summary provides timely insights to individuals in a variety of areas, includ ing market research, surveillance, and media monitoring. This helps them to identify trends, anomalies, or crucial occurrences effectively. Additionally, by reducing the need to store, process, and transmit massive amounts of video data, these strategies aid in re source optimization. Video summarizing reduces costs and increases the effectiveness of systems that handle this type of data by condensing videos into brief representations. By utilizing different AI techniques, this research work seeks to produce different meth v ods of video summarization. The study advances our knowledge of AI-driven methods for video summarization and offers suggestions for potential areas of research and use ful applications to improve the administration and exploitation of video data.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21219
Appears in Collections:Ph.D. Computer Engineering

Files in This Item:
File Description SizeFormat 
RUCHI GOEL Ph.D..pdf4.88 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.