🎉 Limited time — 20% off all plans. View pricing →
← All posts 2026-05-16 13 min read

How to scrape Zillow listings at scale in 2026

The honest guide to extracting Zestimate, price, beds, baths, and lot details from Zillow — what works, what fails, and how proptech teams ship in production.

real estateZillowproptech

What scraping Zillow at scale really means in 2026

Scraping Zillow at scale in 2026 means extracting structured property data — address, price, Zestimate, beds, baths, square feet, year built — from public listing pages on a schedule, through a pipeline that survives Zillow's aggressive bot defenses. The practical path for production teams is a managed actor API that returns clean JSON per address, not a homegrown headless-browser farm.

That's the snippet the search engines and the LLMs will quote. The rest of this page is for the proptech founders, real estate analytics leads, and engineering heads who have to actually ship this. If you want a deeper dive on the defense side of the problem, our blog post Scraping Zillow in 2026: what works, what fails, what to do about it is the long-form companion.

The problem you are actually solving

You don't want to scrape Zillow. You want the data. Specifically, you want a clean, deduplicated, time-stamped property record per address — for off-market deal-flow scoring, for investor underwriting, for proptech analytics, for a CMA tool, or for an AI agent that answers buyer questions.

The dataset is irreplaceable. Zillow tracks over 110 million properties in the United States. There is no sanctioned API path to that data for new developers. Which means every serious proptech team is scraping, licensing third-party datasets, or partnering with MLS systems for the segments where that's feasible.

The technical question of how to get past Zillow's bot defenses is downstream of your real question, which is how to ship a product against this dataset without losing six months to infrastructure work. This recipe answers the second question. The first one is handled for you.

What the leading alternatives offer

Zillow scraping is a mature category with credible vendors. Your evaluation shortlist probably includes some combination of the following.

Bright Data

Bright Data offers both raw proxy infrastructure and pre-built Zillow datasets. Their residential proxy network is one of the largest available, and their compliance posture clears enterprise procurement gates that smaller vendors don't. For teams that want collection capability and dataset products under one MSA, Bright Data is the obvious enterprise shortlist entry.

Apify

Apify hosts a public actor marketplace with several maintained Zillow scrapers, plus the platform infrastructure for you to write and host your own. If you have an engineer who enjoys writing scrapers in TypeScript and you want a managed runtime for that work, Apify is a flexible and well-documented choice. Their community is large and their support is responsive.

Oxylabs

Oxylabs brings serious proxy infrastructure plus a dedicated real estate scraper API for Zillow and Redfin. Their data quality is solid and their compliance and security certifications are mature. Enterprise legal teams take them seriously, and rightly so. For European teams in particular, their GDPR posture is among the strongest in the category.

Where Qcrawl goes further

The Zillow actor in Qcrawl is built specifically for proptech teams that want to ship a product, not maintain a pipeline. Three concrete outcomes set it apart.

First, direct payload extraction. Zillow embeds the full property data as JSON in a script tag on each property page. Our actor parses that payload directly rather than scraping the rendered DOM. The fields stay stable across Zillow's UI changes — when Zillow ships a redesign, your pipeline does not break.

Second, transparent failure handling. When Zillow's bot defense challenges a request, our actor returns a structured error explaining what happened, rather than silently writing a challenge page into your database. That lets your pipeline make an informed retry decision and keeps your data clean. We absorb the retry on our side when we can route the request through a path with a real chance of success.

Third, predictable per-request pricing. No proxy surcharge, no concurrency tier, no minimum monthly commitment. Pricing scales linearly with what you actually pull. For the proptech team building a CMA tool or a deal-flow scorer, that translates into a budget you can defend to your CFO.

Where Bright Data is the heavyweight, Qcrawl is the developer-velocity option. Where Apify gives you a marketplace, Qcrawl gives you a single actor that handles Zillow correctly out of the box. Where Oxylabs wins on enterprise certifications, Qcrawl wins on time-to-first-record — under five minutes from signup to a real address.

The recipe, step by step

Five steps from zero to a production Zillow pipeline.

Step 1. Get an API key

Sign up at qcrawl.com/pricing, copy your API key, and export it. Keys are prefixed with osk_.

export DATASONAR_KEY="osk_xxxxxxxxxxxx"

Step 2. Pull a single property

Confirm the pipeline with one address before scaling. The Zillow actor accepts a full property URL — the kind that ends in a zpid.

curl -X POST https://api.qcrawl.com/v1/actors/zillow \
  -H "Authorization: Bearer $DATASONAR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.zillow.com/homedetails/123-Main-St-Austin-TX-78701/12345678_zpid/"
  }'
{
  "price": 685000,
  "zestimate": 692400,
  "address": "123 Main St, Austin, TX 78701",
  "city": "Austin",
  "state": "TX",
  "zipcode": "78701",
  "bedrooms": 3,
  "bathrooms": 2,
  "living_area_sqft": 1840,
  "year_built": 1962,
  "raw_payload_extracted": true
}

Core property fields are returned cleanly. The raw_payload_extracted flag signals that the actor parsed the embedded Next.js data payload directly — that's the reliable extraction path, more stable across UI redesigns than DOM scraping. Additional fields like lot size and listing status are available on request for Business and Enterprise customers.

Step 3. Scale up with batch

For a real pipeline — hundreds or thousands of properties — use the batch endpoint. Up to 100 URLs per call, run in parallel on our side.

curl -X POST https://api.qcrawl.com/v1/scrape/batch \
  -H "Authorization: Bearer $DATASONAR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://www.zillow.com/homedetails/123-Main-St-Austin-TX-78701/12345678_zpid/",
      "https://www.zillow.com/homedetails/456-Oak-Ave-Austin-TX-78702/23456789_zpid/",
      "https://www.zillow.com/homedetails/789-Pine-Rd-Austin-TX-78703/34567890_zpid/"
    ],
    "format": "json",
    "concurrency": 10
  }'

The scrape/batch endpoint with format: "json" fetches each URL in parallel and returns lean per-URL records (url, title, eval, time_ms, worker). For the structured Zillow fields, fan out per-URL calls to /v1/actors/zillow from your worker pool — the actor parses the embedded Next.js payload server-side. A 50-to-100 concurrent worker pool handles a typical region refresh in minutes.

Step 4. Async with webhook delivery for catalog-scale jobs

Once you cross a few thousand properties in a single job, switch to async. Submit one URL per job, receive results via webhook when each completes.

curl -X POST https://api.qcrawl.com/v1/scrape/async \
  -H "Authorization: Bearer $DATASONAR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.zillow.com/homedetails/123-Main-St-Austin-TX-78701/12345678_zpid/",
    "webhook_url": "https://your-app.example.com/hooks/zillow"
  }'

The endpoint returns a job ID immediately. You can poll GET /v1/jobs/{id} if you prefer pull, but the webhook path is the production pattern. Most teams use the polling endpoint only for debugging.

Step 5. Land it in your warehouse

The last step decides whether your pipeline pays for itself. Land each record with an extracted_at timestamp and the source URL. Keep the raw response in a JSON column alongside the flattened fields so you can re-extract later without re-scraping.

Add a uniqueness constraint on (zpid, extracted_at) and you have a clean time-series for price-change monitoring. Most proptech teams build their first internal tool on top of this table inside a week.

The fields, and what each one tells you

Eight fields drive the majority of proptech decisions. It's worth being explicit about what each one means and where the analytic value lives.

Price is the asking price. It reflects the seller's expectations and the agent's market read. Tracked over time, it tells you about local market momentum — drops indicate softening demand, raises indicate confidence.

Zestimate is Zillow's algorithmic valuation. It is not a price, but it is a useful baseline. The delta between price and Zestimate is often more interesting than either number alone — a listing priced 15 percent above Zestimate in a softening market is a different signal than one priced 5 percent below in a hot market.

Bedrooms and bathrooms are the comparable-sale axis. Almost every CMA model normalizes by bed and bath count. Track both, even when one feels redundant.

Living area is the price-per-square-foot denominator. The single most useful normalized metric in residential real estate analytics.

Lot size matters for single-family and for any analysis involving redevelopment potential. Tear-down investors care about lot size more than living area.

Year built is the proxy for capital expenditure risk. Pre-1980 homes have different rehab profiles than post-2000 homes. Investor models weight this heavily.

Listing status is the state machine field. FOR_SALE, SOLD, OFF_MARKET, and PENDING each mean different things downstream. Always extract it, always store it, always filter on it.

A realistic scenario

An off-market deal-flow startup we work with builds an investor-facing scoring tool that flags undervalued single-family homes in three metro markets. Their previous pipeline was a headless browser farm running on rotating residential proxies, maintained by one full-time engineer who spent roughly half his week firefighting.

The team tracks roughly 38,000 active listings refreshed nightly, plus a long tail of off-market addresses pulled on demand when investors request them. Total monthly volume runs around 1.4 million property pulls.

After the switch, the engineer got his week back. The pipeline runs as a nightly async job against the active listing set, with on-demand pulls routed through the synchronous endpoint when investors trigger a lookup in the app. Total monthly Qcrawl spend dropped meaningfully against the loaded cost of the previous setup once proxy spend, infrastructure, and engineering time were counted. The investor-facing dashboard now shows price changes within hours of Zillow updating them rather than the next day.

The pricing math

Let's run the numbers honestly. A serious in-house Zillow pipeline at 100,000 properties a month carries three significant cost lines: residential proxy spend, browser infrastructure, and engineering attention. Each line is provider- and team-specific, but loaded together the monthly total is rarely small.

A homegrown pipeline at 100k pulls a month carries a loaded monthly cost that surprises most teams when they tally everything honestly, plus the calendar months lost to building it. The same volume on Qcrawl runs at the per-request rates on the pricing page. Most Zillow pipelines below 100k requests a month land cheaper on a managed API than building the equivalent in-house. Above a million requests a month, the calculus is worth a procurement-grade conversation. See qcrawl.com/pricing for volume rates.

What can go wrong

Even with a managed API, a few failure modes are worth planning for.

Off-market and recently-sold properties sometimes return partial data. The price field may be the last sale price rather than a current asking price; the Zestimate is always current. Tag your records with listing_status and handle the three main states — FOR_SALE, SOLD, and OFF_MARKET — distinctly in downstream logic.

Multi-unit and condo listings sometimes return the building-level record rather than the unit-level record. If your use case requires unit-level fidelity, paste the unit-specific URL rather than the building landing page. The actor honors whichever URL you submit.

Zillow occasionally returns a soft challenge even for well-behaved residential traffic. Our actor absorbs the retry on our side within the timeout window. If a request still fails, the response includes a structured error code your pipeline can act on — typically a transient block that resolves on retry. Treat it like any other transient API failure.

For broader context on the legal posture of public-web data collection, the long-running US case law summarized at en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn is the foundational reading. The general direction is favorable for public-page observation, but commercial deployment warrants a conversation with counsel.

Pairing the Zillow actor with the rest of your pipeline

Zillow data alone is powerful. Zillow data combined with other public signals is significantly more so. Most proptech teams pair the Zillow actor with a local business feed from the Google Maps actor for neighborhood context, with the generic scrape endpoint for county tax-assessor pages, and occasionally with competitor pricing monitoring for teams that also operate a brokerage or rental management business.

The pipeline gets more useful the more sources you fold in. Zillow alone tells you a listing. Zillow plus county records plus neighborhood data tells you whether the listing is worth underwriting.

The two-stage pipeline pattern

Most production Zillow pipelines split the work into two stages with different cadences and different cost profiles.

Stage one is discovery. You start with a geographic input — a ZIP code, a city, a county — and produce a list of property URLs to extract. Discovery typically runs on a slower cadence, often weekly, because the universe of listings in a given geography doesn't change dramatically day to day.

Stage two is extraction. You take the URL list from stage one and run each property through the Zillow actor. Extraction runs on whatever cadence your use case demands — nightly for active monitoring, on-demand for investor lookups, one-time for an initial backfill.

Separating the two stages keeps your costs predictable. Discovery is the cheaper operation per call but the higher volume one. Extraction is more expensive per call but lower volume. Treating them as one big pipeline obscures that economics and usually leads to over-polling on discovery.

What proptech teams actually do with the data

Three use cases account for roughly 80 percent of the Zillow extraction we see across customers.

Deal-flow scoring. Investors and off-market specialists score every active listing in their target metros against an underwriting model — cap rate, rehab potential, neighborhood signals, days on market. The scoring runs nightly against fresh data and surfaces a ranked list each morning. The data layer is the Zillow actor plus a county tax-assessor scrape plus a neighborhood demographic feed. The decision layer is whatever model the firm has trained.

Comparative market analysis. Agents and small brokerages need to produce a CMA for a seller in under an hour. The traditional path is hours of manual MLS work. With a Zillow data feed plus a few proprietary signals, that becomes a one-click report. The brokerages building these tools are the most pragmatic customers of the Zillow actor — they don't care how the data arrives, they care that it arrives reliably and correctly attributed.

Consumer-facing search tools. Newer proptech entrants build search experiences competing directly with Zillow and Redfin's own UIs, typically focused on a niche — investor-only properties, rent-to-own, specific architectural styles, sustainable homes. The data backing these tools comes from a combination of MLS partnerships where available and Zillow extraction where not.

How to think about data freshness

Freshness is a function of the use case, not the technology. For an investor lookup, the data needs to be fresh at the moment of the lookup — the synchronous endpoint with a sub-second response is the right path. For an overnight scoring model, fresh-as-of-midnight is fine — the async endpoint with webhook delivery handles this with no overhead.

For price-change monitoring on a specific watchlist, hourly polling is the typical cadence. Zillow itself doesn't update prices in real time — listing agents update them on whatever schedule suits them. An hourly poll catches changes within a useful window without generating volume the use case doesn't warrant.

The mistake we see most often is over-polling. Teams set up sub-hourly refresh on the full 110-million-property universe and end up with a massive bill and no meaningful data quality improvement. Pick the cadence the decision actually needs.

What to do next

Pick five addresses you already know well. Sign up, paste the curl from Step 2, and confirm the actor returns clean data for each. Then expand to your full set, wire the async webhook in week two, and have your first proptech tool in front of users inside a month.

If your use case has a wrinkle — multi-family-only coverage, an unusual geography, a need for fields beyond the default set — send us a note. The proptech use case is one of our most common conversations and we've probably seen the version of your problem you're worried about. Read the docs, explore the actor catalog, and ship the tool your investors are waiting for.

Common questions

Can I scrape Zillow without getting blocked?
Yes, with the right setup. Residential IPs, realistic request pacing, and direct payload extraction from the embedded JSON in each property page work reliably. A managed API like Qcrawl absorbs all three and returns a clean record per address. Naive scrapers using datacenter IPs and default HTTP clients get blocked within seconds.
What's the difference between Zestimate and price?
Price is the asking price set by the seller and their agent. Zestimate is Zillow's algorithmic valuation, computed from comparable sales, tax records, and proprietary signals. The two often differ — sometimes meaningfully. Proptech and investor tools care about both: price is the market signal, Zestimate is the algorithmic baseline.
Is scraping Zillow legal?
Publicly viewable property pages are generally lawful to fetch in the United States and most jurisdictions, with caveats around the Computer Fraud and Abuse Act. Redistribution and commercial republication raise separate questions. The reasonable default is to consult counsel before commercial deployment and respect Zillow's terms of service for the parts of your workflow they govern.
How much does it cost to scrape Zillow at scale?
On a managed API like Qcrawl, a property listing pull runs at the per-request rate published on the pricing page, with volume discounts above a million pulls per month. Building the same throughput in-house typically costs meaningfully more once residential proxy spend, browser infrastructure, and engineering attention are loaded in.
What fields does the Qcrawl Zillow actor return?
Address, city, state, zipcode, current asking price, Zestimate, bedrooms, bathrooms, living area in square feet, lot size, year built, property type, and listing status. Additional fields like price history, tax assessment, and HOA fees are available on Business and Enterprise plans. Custom fields can be added on request.
Can I scrape Zillow by ZIP code?
Yes, but the workflow has two steps. First collect the property URLs for the ZIP — typically by paginating through Zillow's search results. Second, pull each property through the Zillow actor. Most production pipelines treat the search-results crawl and the per-property extraction as separate concerns with different cadences.
Is the Zillow API still available?
Zillow's public Property API has been effectively deprecated for new developers since 2021. Some legacy partner integrations remain. For new proptech, investor tooling, and analytics use cases, scraping or licensed dataset products are the realistic options. Qcrawl's Zillow actor sits in the first category.
How fresh is Zillow data from a scraping API?
Each request fetches the property page live, so data is as fresh as the moment of the call. There's no cache layer between you and Zillow by default. For price-change monitoring, hourly or daily polling is typical. For initial dataset builds, a one-time pull is enough.

Start pulling clean data in minutes.

1,000 requests free every month. No credit card required.