Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/21829
Title: | SUMMARIZING VIDEOS WITH ATTENTION BASED NETWORK |
Authors: | SHINDE, PRATHMESH |
Keywords: | VIDEO SUMMARIZATION ATTENTION TECHNIQUES MODEL ARCHITECTURE ATTENTION BASED NETWORK SUMME TVSUM |
Issue Date: | May-2025 |
Series/Report no.: | TD-8048; |
Abstract: | In this project, I explored a new and more efficient way to summarize videos by focusing on the most important moments—what we call keyshots. Instead of relying on the usual complex models like bi-directional LSTMs with attention, which are not only difficult to implement but also require a lot of computational resources, I took a different route. I designed a simpler model based on a soft self-attention mechanism that’s much easier to work with and faster to train. What makes this approach stand out is that it processes the entire video sequence in just one forward and one backward pass during training. That means it’s not only lightweight but also well-suited for real-world applications where speed and efficiency matter. The self-attention mechanism allows the model to understand the importance of each frame in the context of the whole video—without needing any complex recurrence. I tested this method on two popular video summarization datasets, TvSum and SumMe, and was excited to see that it outperformed many of the existing state-of-the-art techniques. This showed me that a simpler, more streamlined approach can still deliver powerful results. It was a rewarding experience to challenge the norm and come up with a solution that’s both practical and effective. |
URI: | http://dspace.dtu.ac.in:8080/jspui/handle/repository/21829 |
Appears in Collections: | M.E./M.Tech. Computer Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
PRATHMESH SHINDE M.Tech.pdf | 1.56 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.