Parameter | Type | Required | Default | Description |
---|---|---|---|---|
url | string | Yes | - | Starting URL for the crawl |
limit | number | No | 10 | Maximum number of pages to crawl |
maxDepth | number | No | - | Maximum depth to crawl from the starting URL |
maxDiscoveryDepth | number | No | - | Maximum depth for discovering new URLs |
includePaths | string[] | No | - | URL patterns to include in the crawl |
excludePaths | string[] | No | - | URL patterns to exclude from the crawl |
ignoreSitemap | boolean | No | false | Whether to ignore the websiteβs sitemap |
ignoreQueryParameters | boolean | No | - | Whether to ignore URL query parameters |
allowBackwardLinks | boolean | No | false | Allow crawling links that go back in the site hierarchy |
crawlEntireDomain | boolean | No | - | Whether to crawl the entire domain |
allowExternalLinks | boolean | No | - | Whether to follow external links |
delay | number | No | - | Delay between requests (milliseconds) |
maxConcurrency | number | No | - | Maximum number of concurrent requests |
scrapeOptions | object | No | - | Additional scraping options for each page |
poolInteval | number | No | 2 | Polling interval for checking crawl status |
https://example.com
50
pages3
levels*
for pattern matching/blog/*
- All blog pages/products/*
- Product catalog/docs/*
- Documentation/news/2024/*
- Recent news/admin/*
- Admin pages/private/*
- Private sections*.pdf
- PDF files/search?*
- Search results/cart/*
- Shopping cart pages1000-3000ms
(1-3 seconds)1-2
requests10-100
pages3
(scrape content 3 levels deep)5
(find URLs 5 levels deep)true/false
/blog/*
- All blog pages/articles/*
- Article pages/news/*
- News content/docs/*
- Documentation/products/*
- Product pages/catalog/*
- Product catalog/categories/*
- Category pages/blog/2024/*
- Recent blog posts/news/2024/*
- Current year news/admin/*
- Administrative pages/private/*
- Private sections/user/*/private
- User private areas*.pdf
- PDF documents*.jpg
, *.png
- Image files*.zip
- Archive files/search?*
- Search result pages/cart/*
- Shopping cart/checkout/*
- Checkout processhttps://docs.example.com
["/docs/*", "/guides/*", "/tutorials/*"]
["/api/reference/*", "*.pdf"]
200
pages6
levelshttps://store.example.com
["/products/*", "/categories/*"]
["/cart/*", "/checkout/*", "/account/*"]
500
pages4
levels2000ms
(respectful crawling)https://blog.example.com
["/blog/*", "/articles/*", "/news/2024/*"]
["/admin/*", "/author/*/private", "*.pdf"]
100
pages3
levels3000ms
(extra respectful for news sites)https://company.example.com
["/about/*", "/team/*", "/careers/*", "/press/*"]
["/customer-portal/*", "/admin/*"]
50
pages3
levelsignoreSitemap: false
)allowExternalLinks: false
to stay on domainhttps://example.com
example.com
/blog/2024/*
*.pdf
, *.jpg
/*