Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network statusCareers

Back to blog

How to Scrape YouTube Data: Step-by-Step Guide

How to Scrape YouTube Data: Step-by-Step Guide

Vytenis Kaubrė

2023-09-125 min read
Share

YouTube is one of the largest content-sharing platforms in the world, with more than 500 hours of content uploaded each minute. In November 2022, YouTube even secured the second position as the most visited website globally, with 74.8 billion monthly visits, according to Statista.

The sheer volume of public data and traffic on YouTube unlocks various research opportunities for businesses and individuals. Web scraping is the go-to method for extracting data from publicly available YouTube pages, such as video details, comments, channel information, as well as search results. Hence, in this guide, you’ll learn how to leverage Python, Oxylabs’ YouTube Scraper API, and Custom Parser to scrape YouTube videos and harness the potential of YouTube data.

1. Prepare the environment

First, install the latest version of Python, which you can download from the official Python website.

1.1 Install the dependencies

Next, run the following command in your terminal to install the necessary modules:

pip install yt-dlp requests

1.2 Obtain Youtube Scraper API credentials

To use the Oxylabs’ YouTube Scraper API, you’ll need an Oxylabs account. Head to the Oxylabs dashboard and sign up to create a new account. Once you create your account, you’ll get a one-week free trial together with your user credentials. You’ll later need these credentials to extract channel information, subscriber count, and search results.

Get a free trial

Claim a free 7-day trial to test Web Scraper API.

  • 5K requests
  • Cancel anytime
  • 2. Download YouTube videos

    Please note that all information provided herein is for informational purposes only and does not grant you any rights with regard to the described data, videos, or images, which may be protected by copyright, intellectual property, or other rights. Before engaging in scraping activities, you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

    Now, let’s download a YouTube video using the yt-dlp library, which is popular for downloading YouTube videos. For this example, you can use this video as your target URL.

    To download this video, you’ll first need to import the library. Then, use the download() method as shown below:

    from yt_dlp import YoutubeDL
    
    
    video_url = "https://www.youtube.com/watch?v=mDveiNIpqyw"
    opts = dict()
    
    with YoutubeDL(opts) as yt:
        yt.download([video_url])

    When you run this code, the script will download the video and store it in the current folder of your project.

    3. Scrape YouTube video data

    Scraping YouTube videos is also possible with the yt-dlp library. You can extract public video data like the title, video dimensions, and the language used.

    Finding YouTube video information

    Let’s extract video details from the video we’ve downloaded previously. For this task, you can use the extract_info() method with the download=False parameter so that it doesn’t download the video file again. This method will return a dictionary with all the video-related info:

    from yt_dlp import YoutubeDL
    
    
    video_url = "https://www.youtube.com/watch?v=mDveiNIpqyw"
    opts = dict()
    
    with YoutubeDL(opts) as yt:
        info = yt.extract_info(video_url, download=False)
        video_title = info.get("title", "")
        width = info.get("width", "")
        height = info.get("height", "")
        language = info.get("language", "")
        print(video_url, video_title, width, height, language)

    4. Scrape YouTube Comments

    Please note that all information provided herein is for informational purposes only and does not grant you any rights with regard to the described data, which may be protected by corresponding privacy rights or other rights. Before engaging in scraping activities, you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

    To extract all the video comments, you’ll need to pass an additional option getcomments while initializing the yt-dlp library.

    Finding YouTube video comments

    Once you set getcomments to True, the extract_info() method will fetch all the comment threads along with the other information about the video. So, you can extract just the comments from the info dictionary like below:

    from yt_dlp import YoutubeDL
    from pprint import pprint
    
    
    video_url = "https://www.youtube.com/watch?v=mDveiNIpqyw"
    opts = {
        "getcomments": True
    }
    
    with YoutubeDL(opts) as yt:
        info = yt.extract_info(video_url, download=False)
        comments = info["comments"]
        thread_count = info["comment_count"]
        print("Number of threads: {}".format(thread_count))
        pprint(comments)

    5. Scrape YouTube channel information

    Finding YouTube channel information

    For this example, let’s use the Oxylabs channel's “About” section to extract the channel name and description. Here, you’ll have to use your YouTube Scraper API credentials to authenticate with the API.

    5.1 Inspect elements

    The first step is to find the necessary XPath selectors to extract the channel name and description. If you want to use CSS selectors, visit our Custom Parser documentation for more information.

    So, open the “About” page in a web browser and use the Developer Tools to inspect elements. You can simply press CTRL + SHIFT + I on Windows or Option + Command + I on macOS to open the Developer Tools:

    Inspecting HTML elements using Developer Tools

    By inspecting the elements, you can easily construct the relative XPath selector using the IDs associated with the elements. Thus, the XPath selectors are:

    Channel name XPath

    //ytd-channel-name[@id="channel-name"]/div/div/yt-formatted-string[@id="text"]

    Description XPath

    //yt-attributed-string[contains(@id, "description")]/span/text()

    5.2 Prepare parsing instructions

    Now, using the XPath selectors, you can prepare the parsing instructions for YouTube Scraper API. It’s a dictionary that lists all the functions to execute when parsing the data from the HTML content. Let’s begin by importing the requests module and defining the variable instructions that'll contain the parsing instructions:

    import requests
    
    
    url = "https://www.youtube.com/@oxylabs/about"
    
    instructions = {
        "Channel Name": {
            "_fns": [{
                "_fn": "xpath_one",
                "_args": ['//ytd-channel-name[@id="channel-name"]/div/div/yt-formatted-string[@id="text"]/text()']
                }]
        },
        "Description": {
                "_fns": [{
                    "_fn": "xpath_one",
                    "_args": ['//yt-attributed-string[contains(@id, "description")]/span/text()']
                }]
        }
    }

    Note the xpath_one function, which tells the API to select only the first matched element when parsing. 

    5.3 Prepare payload

    Create a new variable payload that'll contain the scraping parameters and parsing instructions that you’ll send to the API:

    payload = {
        "source": "universal",
        "render": "html",
        "parse": "true",
        "parsing_instructions": instructions,
        "url": url,
    }

    The render parameter is set to html, so the API will execute JavaScript to render all dynamic content. parse is also set to true to tell the API that the payload includes parsing_instructions.

    5.4 Make a POST request to the API

    To POST the payload to the API, you’ll have to use the credentials that you’ve obtained from the Oxylabs dashboard:

    credentials = ("USERNAME", "PASSWORD")
    
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=credentials,
        json=payload,
    )
    
    print(response.status_code)

    Replace the USERNAME and PASSWORD with your credentials, run the code, and If everything works as expected, you’ll get a status_code of 200.

    5.5 Extract the channel info

    YouTube Scraper API sends a JSON response from which you can extract the parsed channel name and description, as showcased below:

    channel_name = response.json()["results"][0]["content"]["Channel Name"]
    description = response.json()["results"][0]["content"]["Description"]
    
    print(channel_name)
    print(description)

    Here’s the complete code:

    import requests
    
    
    url = "https://www.youtube.com/@oxylabs/about"
    
    instructions = {
        "Channel Name": {
            "_fns": [{
                "_fn": "xpath_one",
                "_args": ['//ytd-channel-name[@id="channel-name"]/div/div/yt-formatted-string[@id="text"]/text()']
                }]
        },
        "Description": {
                "_fns": [{
                    "_fn": "xpath_one",
                    "_args": ['//yt-attributed-string[contains(@id, "description")]/span/text()']
                }]
        }
    }
    
    payload = {
        "source": "universal",
        "render": "html",
        "parse": "true",
        "parsing_instructions": instructions,
        "url": url,
    }
    
    credentials = ("USERNAME", "PASSWORD")
    
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=credentials,
        json=payload,
    )
    
    print(response.status_code)
    
    channel_name = response.json()["results"][0]["content"]["Channel Name"]
    description = response.json()["results"][0]["content"]["Description"]
    
    print(channel_name)
    print(description)

    6. Scrape YouTube channel subscribers

    You can extract the subscriber count of a YouTube channel using the same approach. Let’s again use the Oxylabs channel’s “About” page:

    Finding the count of YouTube channel subscribers

    By inspecting elements with Developer Tools, you can see the element has an ID subscriber-count, so building XPath is relatively easy: //*[@id="subscriber-count”]. With this information, you can create parsing instructions as follows:

    instructions = {
        "subscribers": {
            "_fns": [{
                "_fn": "xpath_one",
                "_args": ['//*[@id="subscriber-count"]/text()'],
            }]
        },
    }

    And, just like before, the xpath_one function picks only the first match. The rest of the code is almost the same. Here’s the full source code:

    import requests
    
    
    url = "https://www.youtube.com/@oxylabs/about"
    instructions = {
        "subscribers": {
            "_fns": [{
                "_fn": "xpath_one",
                "_args": ['//*[@id="subscriber-count"]/text()'],
            }]
        },
    }
    
    payload = {
        "source": "universal",
        "render": "html",
        "parse": "true",
        "parsing_instructions": instructions,
        "url": url,
    }
    
    credentials = ("USERNAME", "PASSWORD")
    
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=credentials,
        json=payload,
    )
    
    print(response.status_code)
    
    subscribers = response.json()["results"][0]["content"]["subscribers"]
    print(subscribers)

    As the data is in the JSON response, you can extract the parsed subscriber count from the response and print it as an output.

    7. Scrape YouTube search results

    You can also use YouTube Scraper API to scrape public data from search results.

    Finding YouTube search results for the keyword "Oxylabs"

    To scrape video titles and video links of every search result, first, you need to find the related XPath selectors, and then you can modify the instructions as below:

    instructions = {
        "titles": {
            "_fns": [{
                "_fn": "xpath",
                "_args": ['//*[@id="video-title"]/yt-formatted-string/text()']
                }]
        },
        "links": {
                "_fns": [{
                    "_fn": "xpath",
                    "_args": ['//*[@id="video-title"]/@href']
                }]
        }
    }

    In this instance, we’re using xpath instead of xpath_one because there are multiple search results, and we want to extract all of them. The complete code for scraping the search page looks like this:

    import requests
    
    
    url = "https://www.youtube.com/results?search_query=oxylabs"
    
    instructions = {
        "titles": {
            "_fns": [{
                "_fn": "xpath",
                "_args": ['//*[@id="video-title"]/yt-formatted-string/text()']
                }]
        },
        "links": {
                "_fns": [{
                    "_fn": "xpath",
                    "_args": ['//*[@id="video-title"]/@href']
                }]
        }
    }
    
    payload = {
        "source": "universal",
        "render": "html",
        "parse": "true",
        "parsing_instructions": instructions,
        "url": url,
    }
    
    credentials = ("USERNAME", "PASSWORD")
    
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=credentials,
        json=payload,
    )
    
    print(response.status_code)
    
    titles = response.json()["results"][0]["content"]["titles"]
    links = response.json()["results"][0]["content"]["links"]
    base_url = "https://www.youtube.com"
    for title, link in zip(titles, links):
        full_url = f"{base_url}{link}"
        print(title, full_url)

    Since both titles and links variables are Python lists, you can simply use the zip() method to map the relevant titles with the links. 

    Wrap up

    Feel free to expand the source codes with additional functionalities and adjust the target URLs for your YouTube data needs. If you want to store your scraped public data in a CSV or Excel file, check out this in-depth Python web scraping guide for more details. Additionally, visit our API documentation to find more information about the payload parameters and other code examples.

    In case you prefer visual tutorials, take a look at this extensive playlist of Oxylabs’ video guides to get an even easier head-start into web scraping.

    Need to collect data from other sources? See these detailed guides on how to scrape Google Search Results, Bing Search Results, Google News, Google Shopping, as well as Amazon data.

    Frequently asked questions

    Is it legal to scrape YouTube videos?

    The legality of web scraping YouTube videos solely relies on what data you gather and how you use it. It’s important to follow all the regulations and laws that govern online data, including privacy laws and copyright. In addition, it’s always best to seek professional legal advice before engaging in scraping activities.

    It’s also recommended to adhere to the website’s terms of use and follow web scraping best practices. To better understand this topic, we recommend reading this article about the legal frameworks behind web scraping.

    Does YouTube block scrapers?

    Yes, YouTube may block suspicious requests coming from web scrapers. It uses various anti-scraping measures and constantly monitors incoming web requests for any indication of bot-like behavior. Commonly, you may receive a 429 error, so if that's the case in your situation, check out this page on how to fix the YouTube 429 error.

    If you want to learn more about web scraping and bot detection systems, check out this great article on 13 tips for block-free scraping and hear about the bypassing methods from our scraping expert in this free webinar.

    About the author

    Vytenis Kaubrė

    Junior Technical Copywriter

    Vytenis Kaubrė is a Junior Technical Copywriter at Oxylabs. His love for creative writing and a growing interest in technology fuels his daily work, where he crafts technical content and web scrapers with Oxylabs’ solutions. Off duty, you might catch him working on personal projects, coding with Python, or jamming on his electric guitar.

    All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

    Related articles

    Get the latest news from data gathering world

    I’m interested

    Scale up your business with Oxylabs®