Webpage scraper Node
Overview
The Web Scraper Node extracts and processes content from web pages, converting HTML content into plain text in markdown format. It provides options to control how links and images are handled during the extraction process, making it versatile for various web content extraction needs.
Usage cost: 1 credit
Configuration
Settings
URL Configuration
URL*: Web page address to scrape
Supports variable interpolation
Must be publicly accessible
Content Processing Options
Ignore Links: Exclude hyperlinks from output text
Ignore Images: Exclude image content from output text
Output Ports
document
(Document): Complete document object containing:Page content
Metadata (URL, timestamps)
document_content
(string):Extracted text content
Processed according to link/image settings
Best Practices
URL Management
Verify URL accessibility before execution
Use complete URLs including protocol (http/https)
Consider URL encoding for special characters
Content Extraction
Enable link/image ignoring for cleaner text
Monitor content size for large pages
Common Issues
JavaScript-rendered content not captured
Malformed or invalid URLs
Access restrictions (403 errors)
SSL/TLS certificate issues
Last updated