Is web scraping legal?
avatar

Gabija Fatenaite

Sep 10, 2020 12 min read

The demand for big data is significantly raised these years. According to Statista, big data market size revenue is constantly growing every year. This is why the web scraping industry is also gaining more popularity, as it is one of the most common data collection methods. The legality of web scraping is a much-debated topic among developers and others who work in the data gathering field.

In this article, we will cover the important questions about web scraping legality and what web scraping legal issues can one encounter when scraping certain websites.

Furthermore, last year we had a two-day event OxyCon, where our legal counsels Denas and Nerijus went over some of the web scraping legal issues. We made a summary of their presentation, where we will be focusing on the landmark scraping cases that set the tone for future scraping legal claims such as copyright infringement or Computer Fraud and Abuse Act (CFAA).

Before delving into this complex topic, we want to note that this article is for informational purposes only and that any information contained herein does not constitute legal advice. Accordingly, before engaging in any scraping activities, you should get appropriate professional legal advice regarding your specific situation.

Web scraping activity may be legal in cases where it is done without breaching any laws regarding the source targets or data itself. Here are some specific web scraping examples that are illegal:

1. Your web scraper should not log-in to websites and then download data. By logging-in on any websites, users have to agree to the Terms of Service (ToS) (By the way it is also possible to accept ToS just by browsing the website), which may forbid activity like automated data collection.

2. There is a misconception that you can do whatever you want with publicly accessible data. There may be less restrictions for scraping  publicly available information as opposed to private information, but you still have to make sure that you are not breaching laws that may be applicable to such data, for example – downloading copyrighted data. Usually, it includes designs, layouts, articles, videos and everything that can be considered as creative work.

3. Even if data is needed for personal usage, Terms of Service may forbid any kind of automatic data collection. In this case, not data usage but scraping activity itself may be illegal.

Terms of service may forbid any kind of automatic data collection.

Why does web scraping activity sometimes appear negatively?

Web scraping may be legal where you are scraping without breaking any rules or applicable laws surrounding the targeted websites or gathered data. However, malicious actors or hackers intentionally abuse web scraping. Here are some thoughts on why sometimes web scraping is considered a suspicious activity:

1. When you are thinking about web scraping advantages and the importance of data for your business improvement, the public data gathering process does not sound offensive or unethical. On the other hand, if you find out that someone is scraping your website for these same reasons, you may have different thoughts.

2. There are situations when individuals or companies are abusing web scraping and violating ToS (Terms of Services), copyright norms or other applicable laws. In this case, web scraping looks like a malicious and unethical activity. This is the reason why it could be hard to explain and prove that the main idea of web scraping for businesses is to make data-driven decisions from publicly available information.

3. When web scraping is in process, a scraper will send many requests to the websites to get the required information. As it is done automatically, a web scraper could potentially make more requests than a regular user does. If this process is done without regard for the website, it will cause a heavy load. This is one of the main reasons websites have security measures.

General advices for the best web scraping practises

As mentioned before, we advise you to seek legal consultation before engaging in scraping activities of any kind. With that being said, here are some base practical tips that may help ensure compliance when web scraping:

1. Sometimes, websites provide their API for data collection. If it is possible, use it instead of scraping data. Of course, using a provided API is not the same as web scraping. You can learn more about the differences between web scraping vs. API in our other blog posts. 

2. It is essential to respect the Terms of Service (ToS) for each website.

3. Respect the rules of robots.txt. If you really need the data from a specific website, but ToS or robots.txt forbids any automatic data collection, you can try to ask permission to the site owner.

4. Do not use scraped data without making sure that this information is not copyrighted. If it is necessary to publish this data, you should ask written permission from the copyright holder.

If you want to learn more about the best web scraping practices, we have covered this topic in detail from the ethical and technical side.

Web scraping cases

We made you a summary of our legal counsels’ Denas and Nerijus presentation from OxyCon. We concentrated over real scraping cases that set precedent for future scraping legal claims. This may help you to answer the question: “Is web scraping legal in the US?”. However, do not forget that these cases are only examples to understand the situation with web scraping’s legality. You should get appropriate professional advice regarding your specific situation.

The legal framework of data scraping.

Feist Publications v. Rural Telephone Service Co (1991)

The situation concerns two U.S. telephone service companies Feist and Rural. Feist was making compilations of telephone listings and in doing so, copied entries from the Rural’s directory, leading to the latter suing for copyright breach.

In courts, it was decided that an element of creativity is needed for sets of information to be copyrightable. The court found no creativity in Rural’s alphabetical list of phone numbers and denied its copyright protection.

The decision set the tone for future scraping copyright claims, as it established that compilations of factual information were not protectable by copyright. Still, the creative parts of it (e.g., author’s comments, order or style of presentation, etc.) might be.

eBay vs. Bidder’s Edge (2000)

Bidder’s Edge, an online auction listing aggregator, was scraping eBay’s auction data and continued to do so after receiving a C&D (Cease and Desist) letter as well as an IP address block. eBay sued Bidder’s Edge under U.S. legal rule of trespass to chattels, which forbids intentional interference with another person’s movable personal property.

Bidder’s Edge activities only amounted to approximately 100,000 hits per day (1,5% of eBay’s total daily traffic). Despite Bidder’s Edge activities being minor in scale, the court found them sufficient for trespass to chattels to apply and ordered Bidder’s Edge to stop scraping eBay.

The decision was criticized and deconstructed by other courts in future cases. Some of them stated that actual harm would need to be shown to prove “interference” within the context of the trespass to chattels rule.

The Legal Framework of Data Scraping #2
Denas Grybauskas & Nerijus Sveistys, Oxylabs Legal Counsels

Facebook v. Power Ventures (2009)

Power Ventures was an operator of a website, aggregating different social network information on a single page. Because of its scraping activities, it was sued by Facebook for allegedly breaching U.S. Computer Fraud and Abuse Act (the CFAA).

The CFAA forbids obtaining information from a protected computer (or network) after intentionally accessing it without or by excess authorization. The court decided that continuing to access a network after receiving a C&D letter referencing the CFAA can lead to a violation of the said act.

The decision was heavily scrutinized because the question of whether Facebook’s user data should be considered publicly available was not analyzed. Critics also feared that the CFAA could be used as a tool to shut down the competition by big internet companies. Thankfully for scraping companies, a case with similar circumstances yielded a different outcome in a future landmark decision (see below: hiQ labs v. LinkedIn).

Craigslist vs 3Taps (2013)

3taps was scraping Craigslist to aggregate user-submitted Craigslist advertisements. After issuing a C&D letter and an IP address block, Craigslist sued 3taps for breaching the CFAA as well as for infringement of its copyright.

The court applied a similar theory as in Facebook v. Power Ventures case. They said that there is a breach of CFAA as 3Taps authorization to Craigslist’s network was revoked via a C&D letter and IP address block. Concerning the copyright claim, the court found that Craigslist owned the intellectual property rights on the advertisements, as for two weeks, Craigslist’s Terms of Use (ToU) attributed them the full ownership of those rights.

It was also found that these advertisements were protected by copyright, as Craigslist fulfilled the required creativity condition by categorizing the advertisements.

The decision started active discussions with regards to the legal “weight” of C&D letters. They are unilateral lists of demands issued by the sender, as well as to the fact that Craigslist was able to claim exclusive intellectual property rights to its advertisers’ ad copy, even if temporarily.

QVC v. Resultly (2014)

A dispute between QVC, an online and TV retailer that got scraped by Resultly provided a few interesting insights as far as scraping goes.

QVC tried to invoke a different CFAA ground, which prohibits intentionally causing damage. Resultly’s scraping activities (500-600 requests/s) did overload QVC’s servers, but this argument was rebuked as Resultly’s business directly benefited from QVC’s website running without interruption. Further, QVC’s ToU did not prohibit scraping, while its robots.txt file did not put a limit on crawl rates.

No infringement of the CFAA was found in this case by the courts.

Ryanair v. PR Aviation (2015)

Ryanair’s argument with a flight price comparison company PR Aviation provided a glimpse of how scraping could be interpreted in European courts. Ryanair’s website subjects its visitors to ToU, which explicitly prohibits scraping. PR Aviation was scraping Ryanair, who took them to court in the Netherlands for breach of contract.

Ryanair came out second best from the dispute, as the Dutch court said that there was no valid contract formed between the companies. It made an interesting allegory, stating that anyone putting up a poster in a shop window visible from the public road, which reads: “Whoever reads further, must pay € 5,” cannot accept that the person reading this wants to commit to such a condition.

Still, this does not mean that ToU would not be applicable in a different scenario, as there were a lot of circumstances unfavorable to Ryanair here. Namely, the facts that at the time of the scraping, Ryanair was presenting its ToU in a browsewrap, which is not generally accepted as legally binding by courts, as well as the fact that the scraped data was free and accessible to everyone.

The Legal Framework of Data Scraping #3

Ryanair v. Expedia (2019)

Expedia, a U.S. flight comparison company, was scraping Ryanair’s data and continued doing so after receiving a C&D letter. Consequently, it was sued by Ryanair for breaching the CFAA. Expedia argued that Ryanair is an Irish company, therefore the CFAA, a U.S. statute should not be applicable. 

The courts established that the CFAA might indeed apply to U.S. companies acting internationally. After this, Ryanair and Expedia settled the case, with the details being confidential. With that being said, as of this day, there are no Ryanair flights being offered via Expedia’s website.

HiQ labs v. LinkedIn (2019)

HiQ labs is a company that scrapes data from public LinkedIn profiles to provide tools and insights on employees to businesses. After allowing HiQ scrape for several years, in 2017, LinkedIn issued a C&D letter to HiQ and themselves launched a tool similar to HiQ’s functionality. HiQ sought an injunction in court, which was granted, leading to LinkedIn being asked to withdraw the C&D letter and stop applying any blocking measures against HiQ.

LinkedIn appealed the decision, arguing that HiQ’s scraping was breaching the CFAA. The court decided that HiQ was not acting in breach of the CFAA, as the data scraped from LinkedIn was public (profiles containing user-generated content; not put behind a password wall). The court said that companies should not be able to revoke authorization where one is not needed in the first place, as well as that allowing companies like LinkedIn to decide who can collect and use public data would be contrary to the public interest. 

The decision was favorable to scraping companies and reconsidered some of the much-criticized previous court practice regarding the applicability of the CFAA, narrowing the relevance of this act with regards to public data (e.g., Facebook v. Power Ventures, Craigslist v. 3Taps). With that being said, if not done with caution, scraping activities might still be subject to potential breaches of the CFAA (e.g., under different case’s circumstances) as well as other grounds such as, among others, trespass to chattels, copyright or breach of contract.

Wrapping up

There is no simple answer to this question “Is web scraping legal?” as one must answer whether the scraping done does not breach any laws surrounding the said data.

So please, take this article as informational and educational only. It does not replace independent professional advice and judgement. Statements of fact and opinions expressed are those of the presenters only, and unless expressly stated to the contrary, are not the opinion or position of Oxylabs.

avatar

About Gabija Fatenaite

Gabija Fatenaite is a Senior Content Manager at Oxylabs. Having grown up on video games and the internet, she grew to find the tech side of things more and more interesting over the years. So if you ever find yourself wanting to learn more about proxies (or video games), feel free to contact her - she’ll be more than happy to answer you.

Related articles

lxml Tutorial: XML Processing and Web Scraping With lxml

lxml Tutorial: XML Processing and Web Scraping With lxml

Sep 24, 2020

10 min read

How to Crawl a Website Without Getting Blocked

How to Crawl a Website Without Getting Blocked

Sep 24, 2020

9 min read

Python Web Scraping Tutorial: Step-By-Step

Python Web Scraping Tutorial: Step-By-Step

Sep 22, 2020

18 min read

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.