Docs · Conversion
HTML to Markdown
Convert raw HTML to clean Markdown — same converter as scrape, but you supply the HTML.
POST
/v1/convert/html-to-markdown Pure HTML → CommonMark Markdown transformation. Useful for LLM/RAG pipelines that already have HTML from an upstream tool and just want Qcrawl's text cleanup, or for processing locally-stored HTML without a fetch. <script> and <style> are dropped automatically; pass strip_tags to drop more. Links and images are configurable.
Parameters
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| html | string | yes | — | The HTML to convert. Body limit is 1 MB (every endpoint). |
| heading_style | string | no | ATX | ATX (#), ATX_CLOSED (# ... #), or UNDERLINED (===). |
| strip_tags | array | no | — | Additional tag names to drop entirely. script and style are always dropped. |
| include_links | boolean | no | true | Keep <a> as Markdown links. Set false to drop hrefs. |
| include_images | boolean | no | false | Keep <img> as Markdown images. Off by default — most LLM ingestion pipelines drop them. |
Request
curl -X POST https://api.qcrawl.com/v1/convert/html-to-markdown \
-H "Authorization: Bearer osk_..." \
-d '{"html": "<article><h1>Hello</h1><p>This is <a href=\"/x\">linked</a> text.</p><script>alert(1)</script></article>"}' Response
{
"status": "success",
"markdown": "# Hello\n\nThis is [linked](/x) text.",
"byte_count": 36,
"word_count": 5
} Errors
| Code | Meaning |
|---|---|
| 400 | html missing/empty, or invalid heading_style. |
| 413 | html exceeds 1 MB body limit. |