Back to blog

Real Estate Scraper API Quick Start Guide

Real Estate Scraper API Quick Start Guide

Augustas Pelakauskas

2022-11-23
Share

Real Estate Scraper API automates the bulk of the processes in the data extraction pipeline. From sending HTTP requests to receiving a result, the tool is best suited for scraping accurate real estate data without getting blocked.

Users can employ the API to extract publicly available information, such as price, location, property type, amenities, and others, from the most popular real estate websites. As a result, identifying new investment opportunities, adjusting prices for higher profit, leveraging competitor data, and staying on top of market developments are significantly less complex.

Try Real Estate Scraper API right away with our 1-week free trial and start scraping today. Simply go to the Real Estate Scraper API page, register, and get 5K results for free.

The following guide will explain how Real Estate Scraper API functions and how to jump straight to your first web scraping task.

What you get with Real Estate Scraper API

  • Effortless real estate data collection – leverage custom-built features to scrape whole real estate websites or only collect property listings in specific categories. Focus on the result (data) rather than the process itself (data extraction), avoiding CAPTCHAs and other more advanced anti-bot systems.

  • Simple integration – choose a target website, use a code example from our documentation for the chosen website to interact with the API, and receive a result.

  • Bulk scraping – scrape up to 1000 URLs in a single API call.

  • Automated jobs – set a list of URLs and automate data gathering with Scheduler to get regular updates.

  • Proxy management – make use of our management-free 102M+ proxy pool.

  • Multiple delivery options – retrieve results via the API or to your cloud storage bucket (AWS S3 or GCS).

  • 24/7 support – get timely assistance day and night for your web scraping and associated tasks.

  • Cost-efficiency – pay only for successful results. We won’t charge you for faulty scraping tasks.

Data sources

The API can deliver HTML content from most real estate websites, including dynamic pages that use JavaScript to load content. Zillow, Realestate.com.au, Redfin, Zoopla, and many others are well-suited for Real Estate Scraper API.

Real Estate Scraper API – how does it work?

Real Estate Scraper API doesn’t require a prebuilt infrastructure, yet needs some particular resources for making requests, fetching results from the API or reading them from a cloud, and parsing and transforming the data further.

The API and Oxylabs team do the heavy lifting of automating and facilitating a large portion of the underlying processes behind the scenes. You’re left with user input customization options:

  1. Choosing target URLs, geo-location, and JS rendering parameters.

  2. Adding custom headers and cookies or letting us manage them from our side.

  3. Submitting GET or POST requests.

  4. Obtaining data via REST API either directly or to a cloud.

Authentication

Real Estate Scraper API requires a username and password as it uses basic HTTP authentication. The code sample below demonstrates how to use the Realtime delivery method to send a GET request to a real estate website, Zillow.

curl --user "USERNAME:PASSWORD"'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json" -d '{"source": "universal", "url": "https://www.zillow.com/homedetails/10066-Cielo-Dr-Beverly-Hills-CA-90210/243990393_zpid/"}'

You can try this request and start scraping right away with our free trial. Simply go to the Real Estate Scraper API page and register for a free 1 week trial that offers 5K results.

Integration methods

To integrate Real Estate Scraper API, use one of the three methods: Push-Pull, Realtime, or Proxy Endpoint. Let’s see how each method functions.

Push-Pull

Push-pull is not the simplest integration method, yet the most reliable. When you provide us with your job parameters, we give you a job ID that can be used to get content from the /results endpoint later on. You can check the job completion status yourself.

Alternatively, you can set up a listener that accepts POST requests. In this case, we’ll send you a callback message once the job is ready to be retrieved.

The Push-Pull method offers the following functionality:

  • Single query – our endpoint will handle single requests for one keyword or URL. The job ID, together with other information, will be sent to you in an API confirmation message. This ID will aid you in checking your job status manually.

  • Batch query – Real Estate Scraper API can execute multiple keywords (up to 1,000 per batch). You’ll have to post query parameters as data in the JSON body. The system will process every keyword as a separate request and return unique job IDs for every request.

  • Check job status – if you include callback_url in your query, send a POST request to the URL as soon as the scraping task is finished. In case your query doesn’t have callback_url, you’ll need to check the job status manually by using the URL in href under rel:self in the response message.

  • Retrieve job content – as soon as the job content is ready for fetching, you can get it using the URL in href under rel:results.

  • Get notifier IP address list – to whitelist the IPs sending you callback notifier messages, GET this endpoint.

  • Upload to storage – scraped content is stored in our database by default. Yet, we have a custom storage feature that allows you to store results in your cloud storage. This way, you won’t need to make additional requests to fetch results – everything goes directly to your storage.

  • Callback – we’ll notify your callback_url when the data collection task is completed and provide you with a URL to obtain scraped data.

In this quick start guide, we’ll provide an example of interacting with Real Estate Scraper API using the Push-Pull integration method along with the cURL library to make requests. We’ll get content from a target website, Zillow, that returns product listing information in HTML format.

Example of a single query request:

curl --user "USERNAME:PASSWORD"'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json" -d '{"source": "universal", "url": "https://www.zillow.com/homedetails/10066-Cielo-Dr-Beverly-Hills-CA-90210/243990393_zpid/"}'

Sample of the initial response output:

{
"callback_url": null,
"client_id": 4229,
"context": [
{
"key": "successful_status_codes",
"value": []
},
{
"key": "follow_redirects",
"value": null
},
{
"key": "cookies",
"value": []
},
{
"key": "headers",
"value": []
},
{
"key": "session_id",
"value": null
},
{
"key": "http_method",
"value": "get"
},
{
"key": "content",
"value": null
},
{
"key": "store_id",
"value": null
}
],
"created_at": "2022-11-23 08:08:07",
"domain": "com",
"geo_location": null,
"id": "7001094015848286209",
"limit": 10,
"locale": null,
"pages": 1,
"parse": false,
"parser_type": null,
"render": null,
"url": "https://www.zillow.com/homedetails/10066-Cielo-Dr-Beverly-Hills-CA-90210/243990393_zpid/",
"query": "",
"source": "universal",
"start_page": 1,
"status": "pending",
"storage_type": null,
"storage_url": null,
"subdomain": "www",
"content_encoding": "utf-8",
"updated_at": "2022-11-23 08:08:07",
"user_agent_type": "desktop",
"session_info": null,
"statuses": [],
"_links": [
{
"rel": "self",
"href": "http://data.oxylabs.io/v1/queries/7001094015848286209",
"method": "GET"
},
{
"rel": "results",
"href": "http://data.oxylabs.io/v1/queries/7001094015848286209/results",
"method": "GET"
}
]
}

The initial response indicates that we have registered your request to complete a particular data extraction operation. The response also outlines all the job parameters and links where to check whether the job is complete or from where to download the contents.

To check whether the job’s `status` is set to "done", you can use the link from "_links":"href" which is http://data.oxylabs.io/v1/queries/7001094015848286209.

Example of how to check a job status:

curl --user "USERNAME:PASSWORD"
'http://data.oxylabs.io/v1/queries/7001094015848286209'

The response will contain the same data points as the initial response. If the job is  "status": "done", we can retrieve the contents using the link from [“_links”][1][“href”] which is http://data.oxylabs.io/v1/queries/7001094015848286209/results.

Example of how to retrieve data:

curl --user "USERNAME:PASSWORD"
'http://data.oxylabs.io/v1/queries/7001094015848286209/results'

Sample of the response data output:

{
    "results": [
      {
        "content": "24.5.203.132\n", # Actual content from Zillow
        "created_at": "2022-11-23 08:08:07",
        "updated_at": "2022-11-23 08:08:07",
        "page": 1,
        "url": "https://www.zillow.com/homedetails/10066-Cielo-Dr-Beverly-Hills-CA-90210/243990393_zpid/",
        "job_id": "7001094015848286209",
        "status_code": 200
      }
    ]
}

Realtime

With this method, you can send your request and receive data back via the same open HTTPS connection straight away.

Sample request:

curl --user
"USERNAME:PASSWORD"'https://realtime.oxylabs.io/v1/queries' -H
"Content-Type: application/json" -d '{"source": "universal", "url":
"https://www.zillow.com/homedetails/10066-Cielo-Dr-Beverly-Hills-CA-90210/243990393_zpid/"}'

Example response body that will be returned via the open connection:

{
  "results": [
    {
      "content": "<html>
      ZILLOW PAGE CONTENT
      </html>"
      "created_at": "2022-11-23 08:08:07",
      "updated_at": "2022-11-23 08:08:07",
      "id": null,
      "page": 1,
      "url": "https://www.zillow.com/homedetails/10066-Cielo-Dr-Beverly-Hills-CA-90210/243990393_zpid/",
      "job_id": "7001094015848286209",
      "status_code": 200
    }
  ]
}

Proxy Endpoint

Proxy Endpoint only takes completely formed URLs. With that in mind, you can send extra information in the request headers, such as geo-location, to indicate a country for which the result should be adapted. Some websites won't serve content if accessed from particular locations.

Use our entry node as a proxy, authenticate with Real Estate Scraper API credentials, and ignore certificates. Your data will reach you via the same open connection.

Proxy Endpoint code sample in Python programming language:

curl -k -x realtime.oxylabs.io:60000 -U USERNAME:PASSWORD -H
"X-Oxylabs-Geo-Location: United States" "https://www.zillow.com/homedetails/10066-Cielo-Dr-Beverly-Hills-CA-90210/243990393_zpid/"

GitHub

Oxylabs GitHub is the place to go for tutorials on how to scrape websites, use our tools, implement products or integrate them using the most popular programming languages (C#, Java, NodeJs, PHP, Python, etc.).

Parameters*

ParameterDescriptionDefault Value
sourceData source
urlDirect URL (link) to the Universal page
user_agent_typeDevice type and browser. The full list can be found here.desktop
geo_locationGeo-location of the proxy used to retrieve the data. The full list of supported locations can be found here.
locale Locale, as expected in the Accept-Language header.
render Enables JavaScript rendering. Use it when the target requires JavaScript to load content. Only works via the Push-Pull (a.k.a. Callback) method. There are two available values for this parameter: html (get raw output) and png (get a Base64-encoded screenshot).
content_encodingAdd this parameter if you are downloading images. Learn more here.base64
context: contentBase64-encoded POST request body. It is only useful if http_method is set to post.
context:cookiesPass your own cookies.
context:follow_redirectsIndicate whether you would like the scraper to follow redirects (3xx responses with a destination URL) to get the contents of the URL at the end of the redirect chain.true
context:headersPass your own headers.
context:http_methodSet it to post if you would like to make a POST request to your target URL via Universal scraper.GET
context:session_idIf you want to use the same proxy with multiple requests, you can do so by using this parameter. Just set your session to any string you like, and we will assign a proxy to this ID and keep it for up to 10 minutes. After that, if you make another request with the same session ID, a new proxy will be assigned to that particular session ID.
context:successful_status_codesDefine a custom HTTP response code (or a few of them), upon which we should consider the scrape successful and return the content to you. May be useful if you want us to return the 503 error page or in some other non-standard cases.
callback_urlURL to your callback endpoint.
storage_typeStorage service provider. We support Amazon S3 and Google Cloud Storage. The storage_type parameter values for these storage providers are, correspondingly, s3 and gcs. The full implementation can be found on the Upload to Storage page. This feature only works via the Push-Pull (Callback) method.
storage_urlYour storage bucket name. Only works via the Push-Pull (Callback) method.

*All parameters will be provided after purchasing the product.

Response codes 

ResponseError messageDescription
204No contentYou are trying to retrieve a job that has not been completed yet.
400Multiple error messagesBad request structure, could be a misspelled parameter or invalid value. Response body will have a more specific error message.
401‘Authorization header not provided’ / ‘Invalid authorization header’ / ‘Client not found’ Missing authorization header or incorrect login credentials.
403ForbiddenYour account does not have access to this resource.
404Not foundJob ID you are looking for is no longer available.
429Too many requestsExceeded rate limit. Please contact your account manager to increase limits.
500Unknown errorService unavailable.
524Service unavailableService unavailable.
612Undefined internal errorSomething went wrong and we failed the job you submitted. You can try again at no extra cost, as we do not charge you for faulted jobs. If that does not work, please get in touch with us. 
613Faulted after too many retriesWe tried scraping the job you submitted, but gave up after reaching our retry limit.

Conclusion

Real Estate Scraper API is a powerful tool for real estate real-time data collection at scale from almost any website. The Push-Pull, Realtime, and Proxy Endpoint integration methods ensure seamless data delivery. Like any other Oxylabs solution, Real Estate Scraper API has additional benefits, including customer support round-the-clock.

If you have questions or concerns about Real Estate Scraper API or associated features, head over to our documentation for in-depth technical details, visit Oxylabs GitHub, or get in touch via support@oxylabs.io or through the live chat.

About the author

Augustas Pelakauskas

Senior Copywriter

Augustas Pelakauskas is a Senior Copywriter at Oxylabs. Coming from an artistic background, he is deeply invested in various creative ventures - the most recent one being writing. After testing his abilities in the field of freelance journalism, he transitioned to tech content creation. When at ease, he enjoys sunny outdoors and active recreation. As it turns out, his bicycle is his third best friend.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested

IN THIS ARTICLE


  • What you get with Real Estate Scraper API

  • Data sources

  • Real Estate Scraper API – how does it work?

  • Authentication

  • Integration methods

  • GitHub

  • Parameters*

  • Response codes 

  • Conclusion

Scale up your business with Oxylabs®