Docs · Extraction

Extract HTML tables

Parse every <table> in an HTML blob into structured rows. Colspan and rowspan expanded.

POST /v1/extract/tables

You supply HTML, we return every <table> as headers + rows. Header detection priority: <thead> first, then a row of all <th> cells, then null. Colspan duplicates a cell across columns; rowspan carries it down. Caption returned when present (financial and regulatory tables often put the most useful identifier in the caption). <script> and <style> stripped before parsing. No fetch — caller supplies the HTML.

Parameters

Name	Type	Required	Default	Description
html	string	yes	—	HTML containing one or more `<table>` elements. Body limit 1 MB.
min_rows	integer	no	1	Skip tables with fewer rows than this — useful to drop 1-row layout fragments.
min_columns	integer	no	1	Same idea for columns.

Request

curl -X POST https://api.qcrawl.com/v1/extract/tables \
  -H "Authorization: Bearer osk_..." \
  -d '{"html": "<table><thead><tr><th>City</th><th>Population</th></tr></thead><tbody><tr><td>Bangalore</td><td>13M</td></tr><tr><td>Mumbai</td><td>20M</td></tr></tbody></table>"}'

Response

{
  "status": "success",
  "tables": [
    {
      "caption": null,
      "headers": ["City", "Population"],
      "rows": [["Bangalore", "13M"], ["Mumbai", "20M"]],
      "row_count": 2,
      "column_count": 2
    }
  ],
  "table_count": 1
}

POST /v1/extract/structured

Structured data extraction

POST /v1/extract/contacts

Extract contacts from text or HTML

POST /v1/convert/html-to-markdown

HTML to Markdown

Extract HTML tables

Parameters

Request

Response

Related