Docs ยท Extraction
Extract HTML tables
Parse every <table> in an HTML blob into structured rows. Colspan and rowspan expanded.
POST
/v1/extract/tables You supply HTML, we return every <table> as headers + rows. Header detection priority: <thead> first, then a row of all <th> cells, then null. Colspan duplicates a cell across columns; rowspan carries it down. Caption returned when present (financial and regulatory tables often put the most useful identifier in the caption). <script> and <style> stripped before parsing. No fetch โ caller supplies the HTML.
Parameters
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| html | string | yes | โ | HTML containing one or more <table> elements. Body limit 1 MB. |
| min_rows | integer | no | 1 | Skip tables with fewer rows than this โ useful to drop 1-row layout fragments. |
| min_columns | integer | no | 1 | Same idea for columns. |
Request
curl -X POST https://api.qcrawl.com/v1/extract/tables \
-H "Authorization: Bearer osk_..." \
-d '{"html": "<table><thead><tr><th>City</th><th>Population</th></tr></thead><tbody><tr><td>Bangalore</td><td>13M</td></tr><tr><td>Mumbai</td><td>20M</td></tr></tbody></table>"}' Response
{
"status": "success",
"tables": [
{
"caption": null,
"headers": ["City", "Population"],
"rows": [["Bangalore", "13M"], ["Mumbai", "20M"]],
"row_count": 2,
"column_count": 2
}
],
"table_count": 1
}