Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/21854
Title: | URL-TO-KNOWLEDGE: AUTOMATED KNOWLEDGE EXTRACTION AND SUMMARIZATION FROM WEB AND MULTIMEDIA SOURCES USING LLMs |
Authors: | CHOWDHURY, AAYUSH |
Keywords: | URL-TO-KNOWLEDGE KNOWLEDGE EXTRACTION SUMMARIZATION MULTIMEDIA SOURCES LLMs |
Issue Date: | May-2025 |
Series/Report no.: | TD-8077; |
Abstract: | The thesis on "URL-to-Knowledge: Automated Knowledge Extraction and Summarization from Web and Multimedia Sources using LLMs" addresses the challenge of extracting brief, insightful summaries from the vast and diverse content on the internet. In an age of information overload, users struggle to rapidly consume lengthier web articles and multimedia content. URL-to Knowledge, a new system presented in this paper, uses big language models (LLMs) to automatically produce correct and consistent summaries from both static web pages and YouTube videos. The system is meant to be very user-friendly, including a Streamlit-based interface that lets people enter URLs, choose summarization models, define summary length, and pose follow-up questions. It increases accessibility by supporting configurable outputs including downloadable text and audio summaries. The app can also manage multilingual input, converting material into English for more general use. A distilled version of LLaMA for lightweight tasks is used with cutting-edge LLMs—including LLaMA 3 (8B and 34B) and Gemma 2 (9B)—to combine extractive and abstractive techniques in the summarization pipeline. Comparative studies show that while higher-parameter models like LLaMA 3-34B and GPT-4 Turbo generate better summaries with higher factual correctness, but they also have more latency and processing expenses. Mid-sized models such as LLaMA 3-8B and Gemma 2-9B provide a fair competition, providing quick summarization with average quality. "URL-to-Knowledge" is a great tool for professionals, teachers and academics since it greatly lowers the work needed to get knowledge from various online material by combining sophisticated LLM features into a unified and interactive platform. |
URI: | http://dspace.dtu.ac.in:8080/jspui/handle/repository/21854 |
Appears in Collections: | MTech Data Science |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Aayush Chowdhury M.Tech.pdf | 1.75 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.