๐ŸŽ‰ Limited time โ€” 20% off all plans. View pricing โ†’
Docs ยท Extraction

Extract HTML tables

Parse every <table> in an HTML blob into structured rows. Colspan and rowspan expanded.

POST /v1/extract/tables

You supply HTML, we return every <table> as headers + rows. Header detection priority: <thead> first, then a row of all <th> cells, then null. Colspan duplicates a cell across columns; rowspan carries it down. Caption returned when present (financial and regulatory tables often put the most useful identifier in the caption). <script> and <style> stripped before parsing. No fetch โ€” caller supplies the HTML.

Parameters

Name Type Required Default Description
html string yes โ€” HTML containing one or more <table> elements. Body limit 1 MB.
min_rows integer no 1 Skip tables with fewer rows than this โ€” useful to drop 1-row layout fragments.
min_columns integer no 1 Same idea for columns.

Request

curl -X POST https://api.qcrawl.com/v1/extract/tables \
  -H "Authorization: Bearer osk_..." \
  -d '{"html": "<table><thead><tr><th>City</th><th>Population</th></tr></thead><tbody><tr><td>Bangalore</td><td>13M</td></tr><tr><td>Mumbai</td><td>20M</td></tr></tbody></table>"}'

Response

{
  "status": "success",
  "tables": [
    {
      "caption": null,
      "headers": ["City", "Population"],
      "rows": [["Bangalore", "13M"], ["Mumbai", "20M"]],
      "row_count": 2,
      "column_count": 2
    }
  ],
  "table_count": 1
}

Related