Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21854
Title: URL-TO-KNOWLEDGE: AUTOMATED KNOWLEDGE EXTRACTION AND SUMMARIZATION FROM WEB AND MULTIMEDIA SOURCES USING LLMs
Authors: CHOWDHURY, AAYUSH
Keywords: URL-TO-KNOWLEDGE
KNOWLEDGE EXTRACTION
SUMMARIZATION
MULTIMEDIA SOURCES
LLMs
Issue Date: May-2025
Series/Report no.: TD-8077;
Abstract: The thesis on "URL-to-Knowledge: Automated Knowledge Extraction and Summarization from Web and Multimedia Sources using LLMs" addresses the challenge of extracting brief, insightful summaries from the vast and diverse content on the internet. In an age of information overload, users struggle to rapidly consume lengthier web articles and multimedia content. URL-to Knowledge, a new system presented in this paper, uses big language models (LLMs) to automatically produce correct and consistent summaries from both static web pages and YouTube videos. The system is meant to be very user-friendly, including a Streamlit-based interface that lets people enter URLs, choose summarization models, define summary length, and pose follow-up questions. It increases accessibility by supporting configurable outputs including downloadable text and audio summaries. The app can also manage multilingual input, converting material into English for more general use. A distilled version of LLaMA for lightweight tasks is used with cutting-edge LLMs—including LLaMA 3 (8B and 34B) and Gemma 2 (9B)—to combine extractive and abstractive techniques in the summarization pipeline. Comparative studies show that while higher-parameter models like LLaMA 3-34B and GPT-4 Turbo generate better summaries with higher factual correctness, but they also have more latency and processing expenses. Mid-sized models such as LLaMA 3-8B and Gemma 2-9B provide a fair competition, providing quick summarization with average quality. "URL-to-Knowledge" is a great tool for professionals, teachers and academics since it greatly lowers the work needed to get knowledge from various online material by combining sophisticated LLM features into a unified and interactive platform.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21854
Appears in Collections:MTech Data Science

Files in This Item:
File Description SizeFormat 
Aayush Chowdhury M.Tech.pdf1.75 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.