Parameter | Type | Required | Default | Description |
---|---|---|---|---|
url | string | Yes | - | URL of the website to be scraped |
formats | string[] | Yes | ["markdown", "html"] | List of content formats to extract. Valid values: "markdown" , "html" , "json" , "links" , "screenshot" , "screenshot@fullPage" , "rawHtml" |
prompt | string | No | - | A natural language instruction used for schema-less extraction. The LLM will interpret the prompt and return relevant structured data from the page |
schema | object | No | - | A JSON schema that defines the structure and data types to extract from the webpage for precise, structured extraction |
formats
parameter accepts the following values:
"markdown"
- Clean markdown representation of the page content"html"
- Cleaned and formatted HTML content"rawHtml"
- Complete raw HTML source code"json"
- Structured JSON data (used with schemas or prompts)"links"
- All links found on the page"screenshot"
- Screenshot of the visible page area"screenshot@fullPage"
- Full-page screenshot including content below the foldhttps://example.com
["markdown", "html", "links"]
"json"
in your formats selection"json"
in the formats array. The schema defines the exact structure and data types you want to extract from the webpage."json"
in your formats selectionhttps://blog.example.com/latest-post
["markdown", "json"]
https://company.com/about
["json"]
"Extract company name, mission, team size, contact information, and key services offered"
https://store.com/product/123
["json", "screenshot"]
https://example.com
["screenshot@fullPage", "links", "markdown"]
"screenshot"
instead of "screenshot@fullPage"
for large pageshttps://example.com
example.com
"number"
for numerical data"boolean"
for true/false values"array"
for listshttps://
or http://