Is Web Scraping Legal?

Gabija Fatenaite

Last updated on

2025-01-07

9 min read

According to Statista, big data market size revenue is growing every year, and it is only natural that, being a powerful data collection method, web scraping is also gaining more popularity. However, with an increasing number of people adopting it, the legality of web scraping has become a much-debated topic among developers and others who work in the field.

In this article, we will cover essential questions about web scraping legality and what legal issues one can encounter when collecting data from certain websites.

Before we dive in, we want to note that this article is for informational purposes only and that any information contained herein does not constitute legal advice. Accordingly, before engaging in any scraping activities, you should get appropriate professional legal advice regarding your specific situation.

You can also find more insights on web scraping legality in our Industry Expert Interview below:

Is web scraping legal or illegal?

So, is web scraping activity legal or not? It is not illegal as such. There are no specific laws prohibiting web scraping, and many companies employ it in legitimate ways to gain data-driven insights.

However, there can be situations where other laws or regulations may come into play and make web scraping illegal. For example:

1. Your web scraper should not log in to websites or web pages and then download data. By logging in to any website, users have to agree to the Terms of Service (ToS), which may forbid activity like automated data collection.

2. There is a misconception that you can do whatever you want with publicly available data. While there may be fewer restrictions for scraping publicly available data – as opposed to private information – you still have to ensure that you are not breaching laws that may apply to such data.

One example would be downloading copyrighted data. In fact, below are some specific examples of personal and copyrighted information.

Personal	Copyrighted
Full name	News articles or blog posts
Email address	Research papers behind paywalls
Social Security Number (SSN) or National Identification Number	Images, videos, or audio files owned by the website
Health records	Logos
Financial information, like credit card numbers	Books or excerpts published online
Other types of personal data	Other types of copyrighted data

However, keep in mind that even if the data is not personal or not copyrighted, there may be other laws applicable to it.

3. Even if data is needed for personal use, ToS may forbid any kind of automatic data collection. In this case, not the data usage but the scraping activity itself may be illegal.

Why does web scraping sometimes appear negatively?

Web scraping is often misunderstood, leading to several myths about its legality and ethics. One common misconception is that all web scraping is illegal; in reality, it's much more complicated and depends on several factors (laws that may govern the data, data use case and others)

Web scraping may be legal when you are scraping without breaking any applicable laws surrounding the target website or gathered data. However, sometimes malicious actors or hackers intentionally abuse web scraping.

Indeed, when some individuals or companies abuse web scraping and violate ToS, copyright norms, or other applicable laws, it harms the whole industry's reputation, portraying web scraping as malicious and unethical. However, when conducted responsibly, web scraping is a great way for businesses to make data-driven decisions from publicly available information.

Additionally, when web scraping is in process, a scraper will send numerous requests to the target website to get the required information. As this is done automatically, web scraping tools could potentially make more requests than a regular user does. If this process is done without regard for the website, it will cause a heavy load. This is one of the main reasons websites have security measures.

Web scraping myths

Such misconceptions about web scraping have also given rise to several myths that further cloud its reputation and potential benefits. Some people think that all web scraping is illegal, while in reality, it depends on several factors, as we’ve explained above. Some believe that web scraping is always a violation of privacy, but when scraping publicly available information without accessing personal or sensitive data, it remains a lawful and ethical practice. Another common myth is that web scraping always harms website performance, when, in fact, responsible scraping practices minimize server impact.

AI, legal, and web scraping

In the past years, there's been a lot of discussion about legal and ethical matters in the context of LLMs and AI in general; especially when it comes to copyrighted or personal data. For example, according to regulations like GDPR, you have to follow strict requirements on handling data that could identify individuals, including (but not limited to) rules on consent and data minimization. Violating these regulations can result in significant penalties and damage to an organization’s reputation.

As a company offering web scraping tools and solutions,. For example, we do KYC procedures for ensuring the use case is legit, we have a list of restricted targets that include governmental, financial data, and more.

We have an extensive article, where we interviewed a legal professional on the legal landscape of AI and web scraping:

How to Navigate AI, Legal, and Web Scraping: Asking a Professional

Here's a sneak peak:

How do privacy laws affect web scraping?

Another aspect that needs to be considered when scraping publicly available data is various privacy laws.

The GDPR (General Data Protection Regulation) is a data privacy and security law passed by the EU (European Union) and put into effect on May 25, 2018. The main purpose of this regulation is to give EU citizens control over their personally identifiable information by putting limitations on organizations targeting and collecting this data.

The GDPR doesn’t state that web scraping is illegal; however, it restricts what businesses can do with the contact data they wish to extract. For example, in some cases, in order to gather personal data and use it for various purposes, they have to receive explicit consent from the data subjects.

Similarly, the state of California passed a state law, the California Consumer Privacy Act (CCPA), which put businesses collecting personal data under similarly strict requirements (e.g., consumers get a possibility to delete their personal information and opt-out of the sale of their data as well as receive a right to non-discrimination for exercising their CCPA rights).

General advice for the best web scraping practices

As mentioned before, we advise you to seek legal consultation before engaging in scraping activities of any kind. With that being said, here are some base practical tips that may help ensure compliance when web scraping:

1. Sometimes, websites provide their API for data collection. If it is possible, use it instead of scraping data.

2. It is essential to respect the ToS for each website.

3. Respect the rules of robots.txt. If you really need the data from a specific website, but ToS or robots.txt forbids any automatic data collection, you can try to ask permission from the site owner.

4. Do not use scraped data without making sure that this information is not copyrighted. If it is necessary to publish this data, you should ask for written permission from the copyright holder.

If you want to learn more about the best web scraping practices or ethical data collection, we have covered these topics in detail in our blog.

Web scraping cases

To better grasp the legality of web scraping, it can be helpful to look at some real-life scenarios. This can help us better understand the current state of the industry and its future directions.

Below, we will examine some of the most famous cases (some of which we already discussed back in OxyCon 2019). However, keep in mind that these cases are only examples, and you should get appropriate professional advice regarding your specific situation.

Ryanair v. PR Aviation (2018)

Ryanair’s argument with a flight price comparison company, PR Aviation, provided a glimpse of how scraping could be interpreted in European courts. Ryanair’s website subjects its visitors to ToU, which explicitly prohibits scraping. PR Aviation was scraping Ryanair, who took them to court in the Netherlands for breach of contract.

Ryanair came out second best in the dispute, as the Dutch court said that no valid contract had been formed between the companies. It made an interesting allegory, stating that anyone putting up a poster in a shop window visible from the public road, which reads: “Whoever reads further, must pay €5,” cannot accept that the person reading this wants to commit to such a condition.

Still, this does not mean that ToU would not be applicable in a different scenario, as there were a lot of circumstances unfavorable to Ryanair here. Namely, at the time of the scraping, Ryanair was presenting its ToU in a browsewrap, which is not generally accepted as legally binding by courts, as well as the fact that the scraped data was free and accessible to everyone.

Ryanair v. Expedia (2019)

Expedia, a U.S. flight comparison company, was scraping Ryanair’s data and continued doing so after receiving a C&D letter. Consequently, Ryanair sued it for breaching the CFAA. Expedia argued that because Ryanair is an Irish company, the CFAA – a U.S. statute – should not be applicable.

The courts established that the CFAA might indeed apply to U.S. companies acting internationally. After this, Ryanair and Expedia settled the case, with the details being confidential. With that being said, as of this day, there are no Ryanair flights being offered via Expedia’s website.

HiQ Labs v. LinkedIn (2019)

HiQ Labs, a now-defunct workforce data analytics company, scraped data from LinkedIn profiles to provide tools and insights on employees to businesses. After allowing HiQ to collect data for several years, in 2017, LinkedIn issued a C&D letter to HiQ and launched a tool similar to HiQ’s functionality. HiQ sought an injunction in court, which was granted, leading to LinkedIn being asked to withdraw the C&D letter and stop applying any blocking measures against HiQ.

LinkedIn appealed the decision, arguing that HiQ’s scraping was breaching the CFAA. The court decided that HiQ was not acting in breach of the CFAA, as the data scraped from LinkedIn was public (profiles containing user-generated content, not put behind a password wall). The court said that companies should not be able to revoke authorization where one is not needed in the first place, as well as that allowing companies like LinkedIn to decide who can collect and use publicly available data would be contrary to the public interest.

The decision in the HiQ Labs v. LinkedIn case was favorable to scraping companies and reconsidered some of the much-criticized previous court practices regarding the applicability of the CFAA, narrowing the relevance of this act concerning public data (e.g., Facebook v. Power Ventures, Craigslist v. 3Taps). With that being said, if not done with caution, scraping activities might still be subject to potential breaches of the CFAA (e.g., under different case circumstances) as well as other grounds such as, among others, trespass to chattels, copyright, or breach of contract.

However, later in 2022, the Court stated that HiQ’s creation of fake accounts (“Turkers”) to scrape LinkedIn’s data violated LinkeIn’s User Agreement. Therefore, in December 2022, LinkedIn and HiQ reached a settlement in which HiQ agreed to a permanent injunction, requiring HiQ to stop scraping LinkedIn.

If you are interested in the legal aspects of web scraping, watch a recording of our webinar, Web Scraping From a Legal Perspective. During the webinar, an expert panel discusses web scraping laws, cease and desist letters, and ongoing court cases that are relevant to the web scraping community.

Free webinar

Web Scraping From a Legal Perspective

Wrapping up

The legality of web scraping is not easily defined, as it depends on various factors, such as the type of data being scraped or applicable, making it a complex issue to navigate.

With that in mind, we urge you to take this article as informational and educational only. It does not replace independent professional advice and judgment. Statements of fact and opinions expressed are those of the presenters only and, unless expressly stated to the contrary, are not the opinion or position of Oxylabs.

If you wish to learn more about scraping, see our step-by-step guide on how to scrape in Python and try our general-purpose web scraper for free. Also, find out more about ethical Residential Proxy procurement and the 5 best websites to scrape for practice.

Frequently asked questions

Is web scraping for commercial use legal?

Web scraping for commercial use may be legal if it complies with applicable laws surrounding the target website or gathered data.

Is web scraping job postings legal?

Scraping job postings can be legal if the data is publicly available and no applicable laws surrounding the target website or gathered data are violated.

Is web scraping legal in the USA?

Web scraping may be legal in the USA when done responsibly and in compliance with laws like the Computer Fraud and Abuse Act (CFAA) and other laws that may be applicable.

Is web scraping legal in Europe?

Web scraping may be legal in Europe if it adheres to regulations like the GDPR and other laws that may be applicable.

About the author

Gabija Fatenaite

Former Director of Product & Event Marketing

Gabija Fatenaite was a Director of Product & Event Marketing at Oxylabs. Having grown up on video games and the internet, she grew to find the tech side of things more and more interesting over the years. So if you ever find yourself wanting to learn more about proxies (or video games), feel free to contact her - she’ll be more than happy to answer you.

Learn more about Gabija Fatenaite Learn more about Gabija Fatenaite

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.