Perbandingan Performa Tools Web Scraping pada Website dengan Data Statis dan Dinamis

Michael Levi(1*), Henry Novianus Palit(2), Silvia Rostianingsih(3),


(1) Program Studi Informatika
(2) Program Studi Informatika
(3) Program Studi Informatika
(*) Corresponding Author

Abstract


In scraping a website, the main concern is the type of website whether it is a static or dynamic website, and also the data structure of the website. With different website characteristics and diverse web scraping tools, it will make users quite difficult in choosing tools that suit their needs. The purpose of this research is to compare web scraping tools from different website characteristics, and to provide recommendations for web scraping tools for future research by knowing the right tools in handling each website's characteristics. Based on the results of tests that have been done, it can be concluded which tools are more effective and efficient in certain conditions.

Keywords


Web Scraping; CURL; Scrapy; Cheerio; Headless Browser; Dynamic Web Content

Full Text:

PDF

References


Ambre, A., Gaikwad, P., Pawar, K., & Patil, V. 2019. Web

and Android Application for Comparison of E-Commerce

Products. International Journal of Advanced Engineering,

Management and Science (IJAEMS) [Vol-5, Issue-4, Apr2019], 266-268. URI=

http://d.researchbib.com/f/3jnJcuMJ1mYzAioF91pTkiLJEsn

J1uM2ImY2ymp3IyK2McoTImYmHgFHcOEH1GYHSDHv

lZQR5YGVgI2IvLJ5xYaOxMt.pdf.

curl. command line tool and library. URI=

https://curl.haxx.se/.

Draxl, V. 2018. BACHELOR PAPER Web Scraping Data

Extraction from websitesURI=

https://www.academia.edu/35901535/BACHELOR_PAPER

_Web_Scraping_Data_Extraction_from_websites.

Irawan, B., Palit, H. N., & Andjarwirawan, J. 2018. Aplikasi

Android untuk Mencari Harga Tiket Pesawat Termurah dari

Beberapa Situs Travel di Indonesia. Jurnal Infra VOL 7, NO

(2019), 49-54. URI=

http://publication.petra.ac.id/index.php/teknikinformatika/article/view/8752/7900.

json. Introducing JSON. URI= https://www.json.org/jsonen.html.

MDN Web Docs. 2020. Document Object Model (DOM).

URI= https://developer.mozilla.org/enUS/docs/Web/API/Document_Object_Model.

Mitchell, R. 2015. Web Scraping with Python. Sebastopol:

O'Reilly Media, Inc.

MuleSoft. What is an API? (Application Programming

Interface). URI=

https://www.mulesoft.com/resources/api/what-is-an-api.

Saurkar, A. V., Pathare, K. G., & Gode, S. A. 2018. An

Overview on Web Scraping Techniques and Tools.

International Journal on Future Revolution in Computer

Science & Communication Engineering Volume: 4 Issue: 4,

-367. URI=

http://www.ijfrcsce.org/download/browse/Volume_4/April_1

_Volume_4_Issue_4/1524638955_25-04-2018.pdf.

Selenium. Selenium Projects.URI=

https://www.selenium.dev/projects/.

TechTarget. 2005. XPath. URI=

https://whatis.techtarget.com/definition/XPath.

what-is-web-scraping. 2019. URI=

https://hirinfotech.com/what-is-web-scraping/.


Refbacks

  • There are currently no refbacks.


Jurnal telah terindeks oleh :