cURL is a powerful command-line tool for transferring data over various network protocols, including HTTP, HTTPS, FTP, and more. It’s possible to utilize the cURL command within Python code, as well.
While Python has built-in libraries for handling some of these tasks, utilizing cURL functionality with third-party libraries like Requests and PycURL can provide more advanced features and better performance.
In today’s article, you’ll learn how to use the cURL command with the Python code. We’ll dive deep into the steps for using cURL with Python through the PycURL library, covering installation, GET and POST requests, HTTP headers, and JSON handling.
For your convenience, we also prepared this tutorial in a video format:
Let’s get started.
First, you need to install the PycURL library; once you do, you can use it to make a GET request and more. You can do this using pip, the package installer for Python:
pip install pycurl
This command will download and install PycURL with its dependencies, allowing you to use the Python cURL commands.
GET is a rather common request type. For example, when you enter a website, you are, in fact, sending a GET request. In turn, that page may send more GET requests to load images, stylesheets, and other elements.
For this tutorial, we’ll be using this website – https://httpbin.org, which returns data in JSON with all the headers, data, form, and files found within the request. Moreover, this website accepts only POST request methods, while https://httpbin.org/get accepts GET requests.
Sidenote: you’ll notice that this website internally uses the X-Amzn header – ignore it.
Executing a GET request with PycURL is a rather straightforward process. If you use the cURL command and don’t provide the -X option, a GET request is sent by default.
$ curl https://httpbin.org/get
You can do the same thing using the PycURL library:
import pycurl
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/get')
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()
body = buffer.getvalue()
print(body.decode('utf-8'))
In this example, we’re creating a PycURL object, setting the URL option, and providing a buffer to store the response data. For comparison, see how GET requests can be sent via cURL in the terminal.
POST requests send data to a server, typically to create or update a resource. To send a POST request with PycURL, use the following code:
import pycurl
from io import BytesIO
data = {"field1": "value1", "field2": "value2"}
post_data = "&".join([f"{k}={v}" for k, v in data.items()])
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, "https://httpbin.org/post")
c.setopt(c.POSTFIELDS, post_data)
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()
response = buffer.getvalue()
print(response.decode("utf-8"))
Here, we’re creating a dictionary with the data we want to send, convert to a query string format, and setting the POSTFIELDS option to the prepared data. If you're interested in running cURL via your terminal, see this post on how to send POST requests with cURL.
HTTP headers are used to provide additional information about a request or a response. Custom headers can also be included in GET requests, depending on your requirements.
To send custom HTTP headers with a PycURL GET request, use the following code:
import pycurl
from io import BytesIO
headers = ["User-Agent: Python-PycURL", "Accept: application/json"]
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, "https://httpbin.org/headers")
c.setopt(c.HTTPHEADER, headers)
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()
response = buffer.getvalue()
print(response.decode("utf-8"))
In this example, we’re creating a list of custom headers and setting the HTTP HEADER option to this list. After executing the request, we close the PycURL object and print the response. The process of sending HTTP headers with cURL via the terminal doesn't differ too much.
JSON is a popular data format for exchanging data between clients and servers. To send data in a POST request using PycURL, see the following example:
import pycurl
import json
from io import BytesIO
data = {'field1': 'value1', 'field2': 'value2'}
post_data = json.dumps(data)
headers = ['Content-Type: application/json']
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')
c.setopt(c.POSTFIELDS, post_data)
c.setopt(c.HTTPHEADER, headers)
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()
response = buffer.getvalue()
print(response.decode('utf-8'))
In this example, we’re converting the data dictionary to a JSON-formatted string and setting the POSTFIELDS option to the JSON string. We’re also setting the content-type header with the intention of informing the server that we’re sending JSON data.
PycURL can automatically follow HTTP redirects by setting the FOLLOWLOCATION option:
import pycurl
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, "http://httpbin.org")
c.setopt(c.FOLLOWLOCATION, 1)
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()
response = buffer.getvalue()
print(response.decode("utf-8"))
This example demonstrates how to follow redirects by setting the FOLLOWLOCATION option to 1 (True).
To get only the HTTP headers, you can set the HEADERFUNCTION option to a custom function, which will process the received headers:
import pycurl
def process_header(header_line):
print(header_line.decode('utf-8').strip())
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/headers')
c.setopt(c.HEADERFUNCTION, process_header)
c.setopt(c.NOBODY, 1)
c.perform()
c.close()
When it comes to choosing between PycURL and Requests, each library has its own strengths and weaknesses. Let’s take a closer look at both:
PycURL | Requests | |
---|---|---|
Pros | Faster than Requests, powerful, flexible, supports multiple protocols. | Easier to learn and use, more readable syntax, better suited for simple tasks. |
Cons | Steeper learning curve, more verbose syntax. | Slower than PycURL, supports only the HTTP and HTTPS protocols. |
If you prioritize performance and flexibility, PycURL might be a better choice. However, if you’re looking for a simpler and more user-friendly library, you should probably go with Requests.
Web scraping is a technique for extracting information from websites by parsing the HTML content. To perform web scraping tasks, you’ll need additional libraries like BeautifulSoup or lxml. Also, PycURL is particularly useful for web scraping tasks that require handling redirects, cookies, or custom headers.
Typically, web scraping begins with a GET request for retrieving the HTML content of the target webpage. Here's an example of web scraping with PycURL and BeautifulSoup:
import pycurl
from io import BytesIO
from bs4 import BeautifulSoup
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, "https://books.toscrape.com")
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()
html = buffer.getvalue().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")
# Extract data from the parsed HTML
title = soup.find("title")
print(title.text)
In this example, we’re using PycURL to fetch the HTML content. Then, we parse it with BeautifulSoup to extract the desired data.
1. ImportError for pycurl and openssl
In some cases, you may have an error in running the code with the libcurl library. It would look something like this:
mportError: pycurl: libcurl link-time ssl backends (secure-transport, openssl) do not include compile-time ssl backend (none/other)
This error means that the OpenSSL headers are missing from your system. To fix this, use the following commands depending on your operating system.
On macOS, install OpenSSL 1.1 with Homebrew:
brew install openssl@1.1
export LDFLAGS="-L/usr/local/opt/openssl@1.1/lib"
export CPPFLAGS="-I/usr/local/opt/openssl@1.1/include"
Afterwards, reinstall PycURL:
pip uninstall pycurl
pip install pycurl --no-cache-dir
On Windows, download and install the OpenSSL 1.1.x binaries. After that, add the following environment variables:
PYCURL_SSL_LIBRARY with the value / openssl
LIB with the value C:\OpenSSL-Win64\lib (replace C:\OpenSSL-Win64 with the actual installation path if different)
INCLUDE with the value C:\OpenSSL-Win64\include
Reinstall the Python library PycURL, and your code should now work.
2. UnicodeEncodeError when sending non-ASCII data
This error occurs when you try to send non-ASCII characters in a PycURL request without properly encoding the data.
To resolve this issue, make sure to encode the data using the appropriate character encoding (usually 'utf-8') before sending it with PycURL:
import pycurl
from io import BytesIO
data = {"field1": "value1", "field2": "valüe2"}
post_data = "&".join([f"{k}={v}" for k, v in data.items()]).encode('utf-8')
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, "https://httpbin.org/post")
c.setopt(c.POSTFIELDS, post_data)
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()
response = buffer.getvalue()
print(response.decode("utf-8"))
Using cURL with Python through the PycURL library offers a range of powerful features for interacting with web resources and APIs. Following the examples in this guide, you can perform tasks such as GET and POST requests, handling HTTP requests, headers and form data, and even web scraping.
We hope that you found this guide helpful. If you have any questions related to the matter, feel free to contact us at support@oxylabs.io, and our professionals will get back to you within a day. We also have an easy-to-use cURL converter tool to transform cURL commands into your preferred programming languages. If you're curious to learn more about the topic, check out our articles on How to Use cURL With Proxy?, cURL with Python, and cURL with APIs.
Test Oxylabs Scraper APIs designed for advanced web scraping tasks:
cURL is short for client URL and it is an open-source command-line tool designed for creating network requests to transfer data. To read more about cURL, check out our blog post here.
In Python, PycURL is used as a cURL tool for testing REST APIs, downloading files, and transferring data between servers. PycURL it supports several protocols like FILE, FTPS, HTTPS, IMAP, SMB, SCP, etc.
About the author
Roberta Aukstikalnyte
Senior Content Manager
Roberta Aukstikalnyte is a Senior Content Manager at Oxylabs. Having worked various jobs in the tech industry, she especially enjoys finding ways to express complex ideas in simple ways through content. In her free time, Roberta unwinds by reading Ottessa Moshfegh's novels, going to boxing classes, and playing around with makeup.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®
GET IN TOUCH
General:
hello@oxylabs.ioSupport:
support@oxylabs.ioCareer:
career@oxylabs.ioCertified data centers and upstream providers
Connect with us
Advanced proxy solutions
Resources
Innovation hub
oxylabs.io© 2024 All Rights Reserved