# Webpage scraper Node

### Overview

The Web Scraper Node extracts and processes content from web pages, converting HTML content into plain text in markdown format. It provides options to control how links and images are handled during the extraction process, making it versatile for various web content extraction needs.

Usage cost: 1 credit

### Configuration

#### Settings

1. **URL Configuration**
   * URL\*: Web page address to scrape
   * Supports variable interpolation
   * Must be publicly accessible
2. **Content Processing Options**
   * Ignore Links: Exclude hyperlinks from output text
   * Ignore Images: Exclude image content from output text

#### Output Ports

1. `document` (Document): Complete document object containing:
   * Page content
   * Metadata (URL, timestamps)
2. `document_content` (string):
   * Extracted text content
   * Processed according to link/image settings

### Best Practices

1. **URL Management**
   * Verify URL accessibility before execution
   * Use complete URLs including protocol (http/https)
   * Consider URL encoding for special characters
2. **Content Extraction**
   * Enable link/image ignoring for cleaner text
   * Monitor content size for large pages

### Common Issues

* JavaScript-rendered content not captured
* Malformed or invalid URLs
* Access restrictions (403 errors)
* SSL/TLS certificate issues


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.waterflai.ai/studio/studio-builders/flow-components-nodes/webpage-scraper-node.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
