FirecrawlScrape

The FirecrawlScrape tool extracts data from a single webpage and supports multiple output formats including markdown, HTML, JSON, screenshots, and AI-powered structured data extraction.

Overview

FirecrawlScrape is perfect for extracting content from individual web pages with precision and flexibility. Whether you need clean markdown content, structured data, or visual screenshots, this tool provides comprehensive extraction capabilities.

Input Parameters

ParameterTypeRequiredDefaultDescription
urlstringYes-URL of the website to be scraped
formatsstring[]Yes["markdown", "html"]List of content formats to extract. Valid values: "markdown", "html", "json", "links", "screenshot", "screenshot@fullPage", "rawHtml"
promptstringNo-A natural language instruction used for schema-less extraction. The LLM will interpret the prompt and return relevant structured data from the page
schemaobjectNo-A JSON schema that defines the structure and data types to extract from the webpage for precise, structured extraction
Firecrawl Scrape Node Firecrawl Scrape Input Form

Available Formats

The formats parameter accepts the following values:

Content Formats

  • "markdown" - Clean markdown representation of the page content
  • "html" - Cleaned and formatted HTML content
  • "rawHtml" - Complete raw HTML source code
  • "json" - Structured JSON data (used with schemas or prompts)
  • "links" - All links found on the page
  • "screenshot" - Screenshot of the visible page area
  • "screenshot@fullPage" - Full-page screenshot including content below the fold

Basic Usage

Simple Content Extraction

To extract basic content from a webpage:
  1. Enter the URL: Input the target webpage URL in the URL field
  2. Select Formats: Choose the desired output formats (markdown, HTML, etc.)
  3. Run the Task: Execute the scraping operation
Example Configuration:
  • URL: https://example.com
  • Formats: ["markdown", "html", "links"]

Schema-Based Structured Extraction

Define a precise schema for structured data extraction to get specific information in a predictable format.

How to Configure Schema Extraction

  1. Add JSON Format: Include "json" in your formats selection
  2. Define Schema: Specify the data structure you want to extract
  3. Set Data Types: Use appropriate types (string, number, boolean, array, object)
Example Schema for Company Information:
{
  "company_mission": "string",
  "supports_sso": "boolean", 
  "is_open_source": "boolean",
  "is_in_yc": "boolean",
  "employee_count": "number",
  "funding_rounds": "array"
}
Schema-Based Extraction Configuration
When using schema-based extraction, make sure to include "json" in the formats array. The schema defines the exact structure and data types you want to extract from the webpage.

Advanced Schema Examples

Product Information Extraction

{
  "product_name": "string",
  "price": "number",
  "currency": "string",
  "in_stock": "boolean",
  "features": "array",
  "rating": "number",
  "reviews_count": "number",
  "images": "array"
}

Company Information Extraction

{
  "company_name": "string",
  "industry": "string",
  "headquarters": "string",
  "founded_year": "number",
  "employee_count": "string",
  "revenue": "string",
  "key_executives": "array",
  "contact_info": "object"
}

Prompt-Based Extraction

For flexible data extraction without predefined schemas, use natural language prompts to describe what information you want to extract.

How to Configure Prompt Extraction

  1. Add JSON Format: Include "json" in your formats selection
  2. Write Clear Prompt: Describe what information you want to extract
  3. Be Specific: The more specific your prompt, the better the results
Example Prompt:
"Extract the main article title, author, publication date, key points, and any mentioned statistics or data points from this webpage"

Effective Prompt Examples

Content Analysis

"Summarize the main topics discussed, identify key stakeholders mentioned, and extract any numerical data or statistics"

Contact Information

"Find all contact information including emails, phone numbers, addresses, and social media links"

Product Features

"List all product features, pricing information, and customer testimonials or reviews"

Response Format

Firecrawl Best Practices Guide The scraping tool returns structured data based on your configuration:
{
  "data": {
    "markdown": "# Page content in markdown format...",
    "html": "<div>Cleaned HTML content...</div>",
    "json": { 
      "company_mission": "To revolutionize web scraping",
      "supports_sso": true,
      "is_open_source": false
    },
    "rawHtml": "<!DOCTYPE html><html>Complete raw HTML...</html>",
    "screenshot": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...",
    "links": [
      "https://example.com/about",
      "https://example.com/contact", 
      "https://example.com/pricing"
    ],
    "title": "Example Company - Homepage",
    "description": "Leading provider of web scraping solutions",
    "language": "en",
    "keywords": ["web scraping", "automation", "data extraction"]
  },
  "status": "success",
  "message": "Successfully scraped the website"
}

Use Cases

πŸ“Š Content Extraction

Extract blog posts, articles, and documentation:
  • URL: https://blog.example.com/latest-post
  • Formats: ["markdown", "json"]
  • Schema:
    {
      "title": "string",
      "author": "string",
      "published_date": "string",
      "content": "string",
      "tags": "array"
    }
    

🏒 Business Information

Extract company details and information:
  • URL: https://company.com/about
  • Formats: ["json"]
  • Prompt: "Extract company name, mission, team size, contact information, and key services offered"

πŸ›οΈ Product Analysis

Extract product information from e-commerce sites:
  • URL: https://store.com/product/123
  • Formats: ["json", "screenshot"]
  • Schema:
    {
      "name": "string",
      "price": "number",
      "availability": "boolean",
      "features": "array",
      "rating": "number"
    }
    

πŸ“Έ Visual Documentation

Capture page screenshots and structure:
  • URL: https://example.com
  • Formats: ["screenshot@fullPage", "links", "markdown"]

Best Practices

⚑ Performance Tips

  1. Choose Appropriate Formats
    • Only request formats you actually need
    • Use "screenshot" instead of "screenshot@fullPage" for large pages
  2. Use Specific Schemas
    • Define precise data structures for better accuracy
    • Use appropriate data types (number, boolean, array)
  3. Craft Clear Prompts
    • Be specific about what information you want
    • Avoid overly broad or vague instructions

🎯 Extraction Accuracy

  1. URL Format
    • βœ… Always include the protocol: https://example.com
    • ❌ Avoid incomplete URLs: example.com
  2. Schema Data Types
    • βœ… Use "number" for numerical data
    • βœ… Use "boolean" for true/false values
    • βœ… Use "array" for lists
  3. Format Selection
    • βœ… Always specify required formats
    • ❌ Don’t leave formats empty

Common Issues and Solutions

URL Format Errors

  • Problem: Missing protocol in URL
  • Solution: Always include https:// or http://

Missing Required Formats

  • Problem: No formats selected
  • Solution: Always specify at least one output format

Schema Type Mismatches

  • Problem: Using wrong data types in schema
  • Solution: Match data types to expected content (string, number, boolean, array)

Poor Extraction Results

  • Problem: Vague prompts or overly complex schemas
  • Solution: Be specific and clear in prompts, keep schemas focused

Next: Learn Crawling

Explore multi-page website crawling with FirecrawlCrawl