> ## Documentation Index
> Fetch the complete documentation index at: https://docs.bland.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Scrape Websites

> Creates a new knowledge base by scraping content from web URLs.

Create a knowledge base by scraping content from one or more web URLs. This is perfect for creating knowledge bases from documentation sites, blog posts, or other web-based content.

### Headers

<ParamField header="authorization" type="string" required>
  Your API key for authentication.
</ParamField>

<ParamField header="content-type" type="string" required>
  Must be `application/json` for web scraping knowledge bases.
</ParamField>

### Body Parameters

<ParamField body="type" type="string" required default="web">
  Must be `"web"` for web scraping knowledge bases.
</ParamField>

<ParamField body="name" type="string" required>
  Name for the knowledge base.
</ParamField>

<ParamField body="description" type="string">
  Optional description of the knowledge base content.
</ParamField>

<ParamField body="urls" type="string[]" required>
  Array of URLs to scrape (maximum 100 URLs).
</ParamField>

### Response

<ResponseField name="data" type="object">
  The created knowledge base object.

  <ResponseField name="data.id" type="string">
    Unique identifier for the knowledge base.
  </ResponseField>

  <ResponseField name="data.name" type="string">
    Name of the knowledge base.
  </ResponseField>

  <ResponseField name="data.description" type="string">
    Description of the knowledge base (if provided).
  </ResponseField>

  <ResponseField name="data.status" type="string">
    Current status: `"PROCESSING"`, `"COMPLETED"`, `"FAILED"`, or `"DELETED"`.
  </ResponseField>

  <ResponseField name="data.type" type="string">
    Will be `"WEB_SCRAPE"` for web scraping knowledge bases.
  </ResponseField>

  <ResponseField name="data.source_urls" type="string">
    Source URLs that were scraped (comma-separated).
  </ResponseField>

  <ResponseField name="data.base_url" type="string">
    Base URL derived from the provided URLs.
  </ResponseField>

  <ResponseField name="data.created_at" type="string">
    ISO timestamp of creation.
  </ResponseField>

  <ResponseField name="data.updated_at" type="string">
    ISO timestamp of last update.
  </ResponseField>

  <ResponseField name="data.error_message" type="string">
    Error message if status is `"FAILED"`.
  </ResponseField>
</ResponseField>

<ResponseField name="errors" type="null">
  Will be `null` on successful creation.
</ResponseField>

<RequestExample>
  ```bash cURL theme={null}
  curl -X POST https://api.bland.ai/v1/knowledge/learn \
    -H "authorization: YOUR_API_KEY" \
    -H "content-type: application/json" \
    -d '{
      "type": "web",
      "name": "Documentation",
      "description": "Complete product documentation",
      "urls": [
        "https://docs.example.com/overview",
        "https://docs.example.com/api-reference",
        "https://docs.example.com/tutorials"
      ]
    }'
  ```

  ```json Single URL theme={null}
  {
    "type": "web",
    "name": "Blog Posts",
    "description": "Latest company blog posts",
    "urls": [
      "https://example.com/blog/latest-features"
    ]
  }
  ```

  ```json Multiple URLs theme={null}
  {
    "type": "web",
    "name": "Support Documentation",
    "description": "Customer support and FAQ pages",
    "urls": [
      "https://support.example.com/faq",
      "https://support.example.com/troubleshooting",
      "https://support.example.com/getting-started",
      "https://support.example.com/advanced-features"
    ]
  }
  ```
</RequestExample>

<ResponseExample>
  ```json Success Response theme={null}
  {
    "data": {
      "id": "kb_01H8X9QK5R2N7P3M6Z8W4Y1V7V",
      "name": "Documentation",
      "description": "Complete product documentation",
      "status": "PROCESSING",
      "type": "WEB_SCRAPE",
      "source_urls": "https://docs.example.com/overview,https://docs.example.com/api-reference,https://docs.example.com/tutorials",
      "base_url": "https://docs.example.com",
      "created_at": "2025-01-15T12:00:00Z",
      "updated_at": "2025-01-15T12:00:00Z"
    },
    "errors": null
  }
  ```

  ```json Error Response theme={null}
  {
    "data": null,
    "errors": [
      {
        "error": "INVALID_INPUT",
        "message": "URLs array cannot exceed 100 items"
      }
    ]
  }
  ```
</ResponseExample>

***

Docs for agents: [llms.txt](/llms.txt)
