Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21854
Full metadata record
DC FieldValueLanguage
dc.contributor.authorCHOWDHURY, AAYUSH-
dc.date.accessioned2025-07-08T08:49:41Z-
dc.date.available2025-07-08T08:49:41Z-
dc.date.issued2025-05-
dc.identifier.urihttp://dspace.dtu.ac.in:8080/jspui/handle/repository/21854-
dc.description.abstractThe thesis on "URL-to-Knowledge: Automated Knowledge Extraction and Summarization from Web and Multimedia Sources using LLMs" addresses the challenge of extracting brief, insightful summaries from the vast and diverse content on the internet. In an age of information overload, users struggle to rapidly consume lengthier web articles and multimedia content. URL-to Knowledge, a new system presented in this paper, uses big language models (LLMs) to automatically produce correct and consistent summaries from both static web pages and YouTube videos. The system is meant to be very user-friendly, including a Streamlit-based interface that lets people enter URLs, choose summarization models, define summary length, and pose follow-up questions. It increases accessibility by supporting configurable outputs including downloadable text and audio summaries. The app can also manage multilingual input, converting material into English for more general use. A distilled version of LLaMA for lightweight tasks is used with cutting-edge LLMs—including LLaMA 3 (8B and 34B) and Gemma 2 (9B)—to combine extractive and abstractive techniques in the summarization pipeline. Comparative studies show that while higher-parameter models like LLaMA 3-34B and GPT-4 Turbo generate better summaries with higher factual correctness, but they also have more latency and processing expenses. Mid-sized models such as LLaMA 3-8B and Gemma 2-9B provide a fair competition, providing quick summarization with average quality. "URL-to-Knowledge" is a great tool for professionals, teachers and academics since it greatly lowers the work needed to get knowledge from various online material by combining sophisticated LLM features into a unified and interactive platform.en_US
dc.language.isoenen_US
dc.relation.ispartofseriesTD-8077;-
dc.subjectURL-TO-KNOWLEDGEen_US
dc.subjectKNOWLEDGE EXTRACTIONen_US
dc.subjectSUMMARIZATIONen_US
dc.subjectMULTIMEDIA SOURCESen_US
dc.subjectLLMsen_US
dc.titleURL-TO-KNOWLEDGE: AUTOMATED KNOWLEDGE EXTRACTION AND SUMMARIZATION FROM WEB AND MULTIMEDIA SOURCES USING LLMsen_US
dc.typeThesisen_US
Appears in Collections:MTech Data Science

Files in This Item:
File Description SizeFormat 
Aayush Chowdhury M.Tech.pdf1.75 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.