Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/19840
Title: WEB SCRAPING OR WEB CRAWLING : A COMPILER - BASED APPROACH
Authors: ABDULLAH, FAHAD
Keywords: WEB CRAWLER
WEB SCRAPING
COMPILER-BASED APPROACH
PYTHON
Issue Date: May-2023
Series/Report no.: TD-6394;
Abstract: Data stored on the internet is tremendous and ever-increasing. Billions of users from around the world use the internet to search through a staggeringly huge number of web pages present on millions of websites over the web. The content over these web pages is both structured and unstructured making its analysis difficult as well as time-consuming. Web crawlers have emerged as a systematic and valuable tool for extracting data from these websites which helps businesses, researchers, and organizations to gather and understand huge amounts of data for various beneficial purposes. Because of the immense size of data on the web and the limitations in terms of bandwidth, storage, and time, everything that is present on the web cannot be analyzed due to which various approaches of web crawling and scraping became popular. Web crawlers are also aggressively used by search engines to index web pages and by businesses to increase their online presence. The thesis explores the field of web crawling and web scraping and mainly focuses on the use of the compiler-based approach of web crawling where a Python package like ‘PLY’ is proposed. The thesis further explores the advantages of this approach over the traditional web crawling approaches. Further, the widely used tools and strategies used in the field of web crawling are discussed. The thesis puts forward a clear overview of the areas where PLY excels over the traditional approaches. Furthermore, it also focuses on the applications in the real world where web crawling and web scraping is widely popular and in the end asserts the legal considerations related to this field.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/19840
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
FAHAD ABDULLAH M.Tech..pdf903.66 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.