html-async-fetch
Service icon

HTML Async Fetch

Stable version 1.0.1 (Compatible with OutSystems 11)
Uploaded
 on 04 June 2024
 by 
5.0
 (7 ratings)
html-async-fetch

HTML Async Fetch

Details
Component that allows to get HTML from SPA websites after all content is rendered. Also works for non SPA websites.
Read more
This component was created from the necessity to retrieve HTML content from single-page applications (SPA). Unlike traditional web applications where all page content is retrieved from the server on the HTML document, an SPA page is built on the client browser and the final stage of the page is only displayed to end users after all resources and events (e.g.: data fetches) are completed. Retrieving the final HTML document for a SPA application poses a significant challenge, particularly when relying on traditional methods of web scraping, which needs a fully loaded DOM for accurate data extraction. 

To tackle this obstacle head-on, we turned to Puppeteer (https://pptr.dev/), a versatile Node.js library equipped with a high-level API for orchestrating Chrome or Chromium instances via the DevTools Protocol. Leveraging an extension implementing Puppeteer, licensed under the Apache License (https://github.com/puppeteer/puppeteer/blob/main/LICENSE), proved to be the right solution. By programmatically simulating user actions and monitoring network activity, we were able to ensure that all asynchronous requests were completed before extracting the HTML content from the targeted URLs. Additionally, Puppeteer's built-in support for CSS selectors enabled precise and efficient extraction of specific elements from the rendered DOM, further enhancing the accuracy and flexibility of scraping process. 

This approach revolutionized HTML retrieval workflow, providing a multitude of benefits. Firstly, by leveraging Puppeteer's asynchronous execution model, we significantly improved the reliability and consistency of data extraction process, eliminating the uncertainties associated with traditional scraping methods. Moreover, the ability to specify CSS selectors facilitated granular control over the extracted content, enabling to selectively retrieve relevant information from complex SPA layouts with ease.
Release notes (1.0.1)

- New Demo version with updated icons;

Reviews (0)
Team
Other assets in this category