Back to blog

How to Scrape YouTube Data: Step-by-Step Guide

YouTube
vytenis kaubre avatar

Vytenis Kaubrė

Last updated on

2025-04-02

6 min read

YouTube is one of the largest content-sharing platforms in the world, with more than 500 hours of content uploaded each minute. In November 2022, YouTube even secured the second position as the most visited website globally, with 74.8 billion monthly visits, according to Statista.

The sheer volume of public data and traffic on YouTube unlocks various research opportunities for businesses and individuals. Web scraping, often supported by the use of a free proxy server or a premium proxy solution, is the go-to method for extracting data from publicly available YouTube pages, such as video details, comments, channel information, as well as search results.

This guide will show you how to scrape YouTube videos by leveraging Python, the yt-dlp library, proxies, and Oxylabs' YouTube Scraper API. We also have this tutorial available in the video format:

What data can you scrape from YouTube?

YouTube hosts plenty of valuable information, including but not limited to:

🎬 Video title 📝 Video description
👁️ View count ⏱️ Video length
📜 Video transcript 🔗 Video link
🗂️ Video metadata 💬 Comments
📺 Channel name 🔢 Subscribers count
📖 Channel description 🧭 Channel link

1. Prepare the environment

First, install the latest version of Python, which you can download from the official Python website.

1.1 Install the dependencies

Next, run the following command in your terminal to install the necessary modules:

pip install yt-dlp requests

1.2 Obtain Youtube Scraper API credentials

To use the Oxylabs’ YouTube Scraper API, you’ll need an Oxylabs account. Head to the Oxylabs dashboard and sign up to create a new account. Once you create your account, you’ll get a one-week free trial together with your user credentials. You’ll later need these credentials to extract video transcripts, channel information, subscriber count, and search results.

Get a free trial

Claim a free 7-day trial to test Web Scraper API.

  • 5K requests
  • No credit card required

2. Download YouTube videos

Please note that all information provided herein is for informational purposes only and does not grant you any rights with regard to the described data, videos, or images, which may be protected by copyright, intellectual property, or other rights. Before engaging in scraping activities, you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Let's see how to download YouTube videos using the yt-dlp library, which is popular for YouTube video scraping. For this example, you can use this video as your target URL.

To download this video, you’ll first need to import the library. Then, use the download() method as shown below:

from yt_dlp import YoutubeDL


video_url = "https://www.youtube.com/watch?v=mDveiNIpqyw"
opts = dict()

with YoutubeDL(opts) as yt:
    yt.download([video_url])

Note: Before running the code, make sure you have ffmpeg installed on your system. On macOS, for example, you can install it using Homebrew: brew install ffmpeg.

When you run this code, the script will download the video and store it in the current folder of your project.

You can also download multiple YouTube videos simultaneously with YouTube Scraper API. Head to the documentation to learn more.

3. Scrape YouTube video transcript

Scraping YouTube subtitles provides a great source of natural language data for LLM training and other use cases.

Finding YouTube video transcript/subtitles

For this, you can leverage YouTube Scraper API to gather video transcripts on a large scale:

import requests
import json


payload = {
    "source": "youtube_transcript",
    "query": "mDveiNIpqyw",
    "context": [{"key": "language_code", "value": "en"}]
}

response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    auth=("USERNAME", "PASSWORD"),
    json=payload
)

print(response.status_code)
print(response.json())

with open("yt_transcript.json", "w") as file:
    json.dump(response.json(), file, indent=4)

Make sure to replace the USERNAME and PASSWORD with the API user credentials. Visit our documentation to learn more about the language and origin parameters supported by the API.

4. Scrape YouTube video data

Scraping YouTube videos is also possible with the yt-dlp library. You can extract public video data like the title, video dimensions, and the language used.

Finding YouTube video information

Let’s extract video details from the video we’ve downloaded previously. For this task, you can use the extract_info() method with the download=False parameter so that it doesn’t download the video file again. This method will return a dictionary with all the video-related info:

from yt_dlp import YoutubeDL


video_url = "https://www.youtube.com/watch?v=mDveiNIpqyw"
opts = dict()

with YoutubeDL(opts) as yt:
    info = yt.extract_info(video_url, download=False)
    video_title = info.get("title", "")
    width = info.get("width", "")
    height = info.get("height", "")
    language = info.get("language", "")
    print(video_url, video_title, width, height, language)

YouTube Scraper API also supports scraping video metadata, check out the documentation to learn more.

5. Scrape YouTube Comments

Please note that all information provided herein is for informational purposes only and does not grant you any rights with regard to the described data, which may be protected by corresponding privacy rights or other rights. Before engaging in scraping activities, you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

To extract all the video comments, you’ll need to pass an additional option getcomments while initializing the yt-dlp library.

Finding YouTube video comments

Once you set getcomments to True, the extract_info() method will fetch all the comment threads along with the other information about the video. So, you can extract just the comments from the info dictionary like below:

from pprint import pprint
from yt_dlp import YoutubeDL


video_url = "https://www.youtube.com/watch?v=mDveiNIpqyw"
opts = {"getcomments": True}

with YoutubeDL(opts) as yt:
    info = yt.extract_info(video_url, download=False)
    comments = info["comments"]
    thread_count = info["comment_count"]
    print("Number of threads: {}".format(thread_count))
    pprint(comments)

6. Scrape YouTube channel information

For this example, let’s use the Oxylabs channel's “About” section to extract the channel name and description. Here, you’ll have to use your YouTube API credentials to authenticate with the API.

6.1 Inspect elements

The first step is to find the necessary XPath selectors to extract the channel name and description. If you want to use CSS selectors, visit our Custom Parser documentation for more information.

So, open the “About” page in a web browser and use the Developer Tools to inspect elements. You can simply press CTRL + SHIFT + I on Windows or Option + Command + I on macOS to open the Developer Tools:

Inspecting HTML elements using Developer Tools

By inspecting the elements, you can easily construct the relative XPath selector using the IDs associated with the elements. Thus, the XPath selectors are:

Channel name XPath

//h2[@id="title"]/yt-formatted-string/text()

Description XPath

//yt-attributed-string[contains(@id, "description")]/span/text()

6.2 Prepare parsing instructions

Now, using the XPath selectors, you can prepare the parsing instructions for YouTube data scraper. It’s a dictionary that lists all the functions to execute when parsing the data from the HTML content. Let’s begin by importing the requests module and defining the variable instructions that'll contain the parsing instructions:

import requests


url = "https://www.youtube.com/@oxylabs/about"

instructions = {
    "Channel Name": {
        "_fns": [{
            "_fn": "xpath_one",
            "_args": ['//h2[@id="title"]/yt-formatted-string/text()']
            }]
    },
    "Description": {
            "_fns": [{
                "_fn": "xpath_one",
                "_args": ['//yt-attributed-string[contains(@id, "description")]/span/text()']
            }]
    }
}

Note the xpath_one function, which tells the API to select only the first matched element when parsing. If needed, you can also extract other channel data, such as social links and more info.

6.3 Prepare payload

Create a new variable payload that'll contain the scraping parameters and parsing instructions that you’ll send to the API:

payload = {
    "source": "universal",
    "render": "html",
    "parse": True,
    "parsing_instructions": instructions,
    "url": url
}

The render parameter is set to html, so the API will execute JavaScript to render all dynamic content. parse is also set to true to tell the API that the payload includes parsing_instructions.

6.4 Make a POST request to the API

To POST the payload to the API, you’ll have to use the credentials that you’ve obtained from the Oxylabs dashboard:

response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    auth=("USERNAME", "PASSWORD"),
    json=payload
)
print(response.status_code)

Replace the USERNAME and PASSWORD with your credentials, run the code, and If everything works as expected, you’ll get a status_code of 200.

6.5 Extract the channel info

YouTube Scraper API sends a JSON response from which you can extract the parsed channel name and description, as showcased below:

channel_name = response.json()["results"][0]["content"]["Channel Name"]
description = response.json()["results"][0]["content"]["Description"]

print(channel_name)
print(description)

Here’s the complete code:

import requests


url = "https://www.youtube.com/@oxylabs/about"

instructions = {
    "Channel Name": {
        "_fns": [{
            "_fn": "xpath_one",
            "_args": ['//h2[@id="title"]/yt-formatted-string/text()']
            }]
    },
    "Description": {
            "_fns": [{
                "_fn": "xpath_one",
                "_args": ['//yt-attributed-string[contains(@id, "description")]/span/text()']
            }]
    }
}

payload = {
    "source": "universal",
    "render": "html",
    "parse": True,
    "parsing_instructions": instructions,
    "url": url,
}

response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    auth=("USERNAME", "PASSWORD"),
    json=payload
)
print(response.status_code)

channel_name = response.json()["results"][0]["content"]["Channel Name"]
description = response.json()["results"][0]["content"]["Description"]

print(channel_name)
print(description)

7. Scrape YouTube channel subscribers

You can extract the subscriber count of a YouTube channel using the same approach. Let’s again use the Oxylabs channel’s “About” page:

By inspecting elements with Developer Tools, you can build the XPath as follows: //td[contains(@class, "ytd-about-channel-renderer") and contains(text(), "subscribers")]/text(). With this information, you can create parsing instructions like shown below:

instructions = {
    "subscribers": {
        "_fns": [{
            "_fn": "xpath_one",
            "_args": [
                '//td[contains(@class, "ytd-about-channel-renderer") '
                'and contains(text(), "subscribers")]/text()'
            ]
        }]
    }
}

And, just like before, the xpath_one function picks only the first match. The rest of the code is almost the same. Use the previously shown API code, replace the instructions, and extract the data from the JSON response:

subscribers = response.json()["results"][0]["content"]["subscribers"]
print(subscribers)

As the data is in the JSON response, you can extract the parsed subscriber count from the response and print it as an output.

8. Scrape YouTube search results

You can utilize the yt-dlp module as well as YouTube Scraper API to scrape search results from YouTube. Let's see both methods in action.

Finding YouTube search results for the keyword "Oxylabs"

8.1 Import modules and define parameters

First, import the json and yt_dlp modules:

import json
from yt_dlp import YoutubeDL

Then, define the number of results you'd like to scrape as well as the specific search query.

MAX_RESULTS = 20
QUERY = "oxylabs"

To overcome IP blocks and localize results for different countries, let's set up Oxylabs Residential Proxies. The following code will ensure you send requests through proxies located in the United States:

USER = "USERNAME"
PASSW = "PASSWORD"

opts = {
    "proxy": f"https://customer-{USER}:{PASSW}@us-pr.oxylabs.io:10000",
    "default_search": "ytsearch",
    "noplaylist": True,
    "extract_flat": True,
    "quiet": True
}

8.2 Send a request

Next, pass the opts dictionary to YoutubeDL and initiate a request using the previously defined parameters:

with YoutubeDL(opts) as yt:
    search_query = f"ytsearch{MAX_RESULTS}:{QUERY}"
    info = yt.extract_info(search_query, download=False)
    entries = info.get("entries", [])

8.3 Extract YouTube search results

Inside the yt context manager add your data extraction logic. You should loop over the entries and define specific data points you want to scrape:

    data = []
    for video in entries:
        data.append({
            "title": video.get("title"),
            "channel": video.get("channel"),
            "view_count": video.get("view_count"),
            "duration": video.get("duration"),
            "url": video.get("url"),
            "channel_url": video.get("channel_url")
        })

Here, you can extract additional video data, including video ID, channel ID, thumbnail links, and more. If you want to view the available data points, simply add a print(video) statement inside the for loop and run the code as is.

8.4 Save results to JSON

Finally, use the JSON module to easily dump all results to a JSON file:

with open("yt_search.json", "w") as file:
    json.dump(data, file, indent=4)

Putting everything together, your final Python script should look as follows:

import json
from yt_dlp import YoutubeDL


MAX_RESULTS = 20
QUERY = "oxylabs"

USER = "USERNAME"
PASSW = "PASSWORD"

opts = {
    "proxy": f"https://customer-{USER}:{PASSW}@us-pr.oxylabs.io:10000",
    "default_search": "ytsearch",
    "noplaylist": True,
    "extract_flat": True,
    "quiet": True
}

with YoutubeDL(opts) as yt:
    search_query = f"ytsearch{MAX_RESULTS}:{QUERY}"
    info = yt.extract_info(search_query, download=False)
    entries = info.get("entries", [])

    data = []
    for video in entries:
        data.append({
            "title": video.get("title"),
            "channel": video.get("channel"),
            "view_count": video.get("view_count"),
            "duration": video.get("duration"),
            "url": video.get("url"),
            "channel_url": video.get("channel_url")
        })

with open("yt_search.json", "w") as file:
    json.dump(data, file, indent=4)

Scraping with the API

Oxylabs' YouTube video scraper automatically extracts data from YouTube and uses a built-in proxy pool to bypass blocks. Check out the documentation to learn more about filtering search results.

import json
import requests


payload = {
    "source": "youtube_search",
    "query": "oxylabs"
}

response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    auth=("USERNAME", "PASSWORD"),
    json=payload
)

print(response.status_code)
print(response.json())

with open("yt_search_api.json", "w") as file:
    json.dump(response.json(), file, indent=4)

Comparing different scraping methods

Method Pros Cons
No Proxies Very simple setup, no extra proxy fees. High chance of IP blocks/CAPTCHAs, no geo-targeting, limited scalability.
With Proxies Reduced IP bans, geo-restricted data access, better reliability at scale. Extra costs, proxy pool management if not offered by the provider.
Using an API Built-in IP rotation/CAPTCHA handling, highly scalable, headless browser, quick to integrate. Subscription costs, dependence on provider, possible limitations on specific data requests.
Other Tools (e.g., Selenium) Full control, ideal for dynamic sites, no ongoing fees if fully self-managed. Steep learning curve, must manually handle anti-scraping measures, can be slower than direct/API setups.

Wrap up

Feel free to expand the source codes with additional functionalities and adjust the target URLs for your YouTube data needs. If you want to store your scraped public data in a CSV or Excel file, check out this in-depth Python web scraping guide for more details. Additionally, visit our YouTube web scraper documentation to find more information about the payload parameters and other code examples.

In case you prefer visual tutorials, take a look at this extensive playlist of Oxylabs’ video guides to get an even easier head-start into web scraping, integration of various types of proxies, such as residential proxies, as well as recordings from webinars.

Need to collect data from other sources or buy a proxy server for better YouTube scraping performance? See these detailed guides on how to scrape Google Search Results, Bing Search Results, Google News, Google Shopping, as well as Amazon data.

Frequently asked questions

Is it legal to scrape YouTube videos?

The legality of web scraping YouTube videos solely relies on what data you gather and how you use it. It’s important to follow all the regulations and laws that govern online data, including privacy laws and copyright. In addition, it’s always best to seek professional legal advice before engaging in scraping activities.

It’s also recommended to adhere to the website’s terms of use and follow web scraping best practices. To better understand this topic, we recommend reading this article about the legal frameworks behind web scraping.

Does YouTube block scrapers?

Yes, YouTube may block suspicious requests coming from web scrapers. It uses various anti-scraping measures and constantly monitors incoming web requests for any indication of bot-like behavior. Commonly, you may receive a 429 error, so if that's the case in your situation, check out this page on how to fix the YouTube 429 error.

If you want to learn more about web scraping and bot detection systems, check out this great article on 13 tips for block-free scraping and hear about the bypassing methods from our scraping expert in this free webinar.

How do I scrape YouTube Shorts?

If you want to scrape YouTube data from Shorts, you can use a web scraper and instruct it to access a channel's Shorts section, for example: https://www.youtube.com/@oxylabs/shorts. This way, you can extract the URL, title, and views count from each Shorts video.

How do I scrape YouTube emails?

Scraping YouTube channel emails is a bit tricky, but not impossible. You should use a headless browser and access a channel's about page, for instance: https://www.youtube.com/@oxylabs/about. Then, under the More info section, instruct your browser to click the View email address button, complete a basic recaptcha challenge, and click Submit. After that, an email address will appear which you can extract using your scraper.

How to scrape YouTube comments?

You can easily scrape comments by utilizing the yt-dlp library in Python and setting additional options to {"getcomments": True}. Alternatively, you can use YouTube Scraper API with built-in proxies and headless browser to gather comment details. Check this blog post for a complete guide on scraping YouTube comment threads.

About the author

vytenis kaubre avatar

Vytenis Kaubrė

Technical Copywriter

Vytenis Kaubrė is a Technical Copywriter at Oxylabs. His love for creative writing and a growing interest in technology fuels his daily work, where he crafts technical content and web scrapers with Oxylabs’ solutions. Off duty, you might catch him working on personal projects, coding with Python, or jamming on his electric guitar.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested