AliExpress is a global e-commerce platform offering a wide array of products ranging from electronics to fashion items at competitive prices. As such, the available public data is a jackpot for anyone looking to gain competitive intelligence or insights into product pricing, availability, customer sentiment, and similar e-commerce metrics!
In this Python guide, you’ll learn how to scrape AliExpress data like search results, product details, reviews, and top-selling products. Furthermore, Oxylabs’ AliExpress Scraper API will help overcome any anti-scraping blocks that may come along the way.
Web scraping AliExpress can easily boost your web intelligence efforts. With Python and Oxylabs' solutions in your hands, you can gather the following public details:
💰 Current prices | 💵 Original prices |
---|---|
ℹ️ Product data | 🛒 Number of sold items |
⭐ Ratings | 📝 Reviews |
🖼️ Images | 🚚 Delivery information |
🔗 Related products | 👤 Seller information and products |
As highlighted above, Oxylabs’ AliExpress data scraper will help to navigate any possible blocks, such as CAPTCHAs or IP bans. It’ll also ease the entire scraping process with the use of two heavy-duty API features: a Headless Browser combined with a Custom Parser.
Claim your 7-day free trial to test AliExpress Scraper API.
If you don’t have Python set up on your computer, you can download it from the official Python website. Once it’s ready, open up your terminal and install the requests library via pip:
python -m pip install requests
The requests module will help with sending HTTP requests to the API, while additional built-in libraries, like re, json, and csv, will be used to process returned results.
Web scraping AliExpress search results is extremely valuable in that there’s plenty of useful product information, like prices, ratings, and the amount of sold items. Additionally, the retrieved AliExpress product page URLs can be used later to extract more specific product data. So, begin by creating a new .py file, and let's get to writing Python code.
Inside your file, import the requests and csv libraries:
import requests, csv
Then, create a variable to store your API username and password:
API_credentials = ('USERNAME', 'PASSWORD')
Next, define your search keyword, which will be used to search the AliExpress website:
keyword = 'desktop cpu'
Oxylabs’ APIs take instructions in JSON format, so let’s create the payload dictionary that’ll store the scraping and parsing instructions for our AliExpress target page:
payload = {
'source': 'universal',
'url': None,
'geo_location': 'United States',
'locale': 'en-us',
'user_agent_type': 'desktop',
}
For now, the url value is set to None, as later on, you’ll form the AliExpress listing page URLs dynamically using pagination and pass each URL to the payload.
One thing to note about AliExpress search pages is that only 12 product listings are loaded by default. All the remaining listings load only when you scroll the page. This technique is commonly called “lazy loading”. See the below HTML structure of an AliExpress search page for reference:
As a result, you must simulate page scrolling in order to scrape all product listings from an AliExpress search page. For this scenario, you can easily instruct the API’s Headless Browser to scroll the page a specific amount of pixels:
payload = {
'source': 'universal',
'url': None,
'geo_location': 'United States',
'locale': 'en-us',
'user_agent_type': 'desktop',
'render': 'html',
'browser_instructions': [
{
'type': 'scroll',
'x': 0,
'y': 3000,
'wait_time_s': 2
}
] * 3,
}
As you can see, it’s enough to create a single instruction and then multiply it by 3 so the headless browser would scroll down by 3000 pixels three times.
Once you have all the AliExpress products rendered on the page, you can run the scraper as is to return the raw HTML file or use the built-in Custom Parser. Let’s utilize the latter option by defining our own parsing logic and retrieving only the relevant data. Start by adding the 'parse': True parameter and creating the parsing_instructions key:
payload = {
# Previous parameters...
'parse': True,
'parsing_instructions': {}
}
Now, inside the parsing_instructions, you can start nesting specific parsing commands. The idea is first to define the selector (XPath or CSS) that selects a product listing and then use the _items iterator to process each listing individually. For more information, please see our documentation and check out this GitHub repository on how to write custom parsing instructions. So far, your parsing instructions should look like this:
'parsing_instructions': {
'products': {
'_fns': [
{
'_fn': 'xpath',
'_args': ['//div[@id="card-list"]/div']
}
],
'_items': {}
}
}
}
Title selector:
Next, within the _items iterator, create the Title property to extract product titles from each listing:
'_items': {
'Title': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['.//h3/text()']
}
]
},
Note the use of . within the XPath expression. It tells the API to select h3 elements inside the current node, which you’ve defined previously (//div[@id="card-list"]/div).
Current price selector:
The current price is split into separate span elements, as shown above, making scraping a little inconvenient. The good thing is you can easily overcome this issue by iterating over each span with the _items iterator. Furthermore, it’s better to use XPath expressions that don’t rely on full class names, if possible, as these selectors can stop working with minor website changes. For this reason, you may want to use the contains() function of XPath:
'Price current': {
'_fns': [
{
'_fn': 'xpath',
'_args': ['.//div[contains(@class, "price-sale")]']
}
],
'_items': {
'_fns': [
{'_fn': 'xpath', '_args': ['.//span/text()']},
{'_fn': 'join', '_args': ''}
]
}
},
Note the join function, which, as the name suggests, joins the scraped text into one string.
Original price selector:
To scrape the original product's price, you can simply use a single function since the entire price amount is inside a single span element:
'Price original': {
'_fns': [
{
'_fn': 'xpath',
'_args': ['.//div[contains(@class, "price-original")]/span/text()']
}
]
},
URL selector:
You can grab the product URL from the href attribute value, which is inside the a element. Since the full URL has various query parameters and fragments, you can clean it up with a regular expression by utilizing the regex_find_all function:
'URL': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['.//a/@href']
},
{
'_fn': 'regex_find_all',
'_args': [r'^\/\/(.*?)(?=\?)']
}
]
}
}
}
}
}
As mentioned previously, let’s dynamically form the AliExpress search page URLs. You can easily do so by adding the ?page={page number} parameter to the end of the URL. For example, to access the second page of the search results, you would add ?page=2. The following code achieves this result by utilizing the number_of_pages variable:
page = f'https://www.aliexpress.us/w/wholesale-{keyword.replace(" ", "-")}.html'
urls = [f'{page}?page={page_num}' for page_num in range(1, number_of_pages + 1)]
The next step is to assign the formed URL to the payload, then send the payload to AliExpress Scraper API for each URL, and finally add the returned results to the data list. Note that the API returns a JSON response with information about the submitted job, while the scraped content is nested inside results > content keys. Since the payload contains parsing instructions inside the products key, the parsed data can then be gathered from products:
data = []
for url in urls:
payload['url'] = url
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=API_credentials,
json=payload
)
data.extend(response.json()['results'][0]['content']['products'])
After sending the request and adding the scraped products to the data list, you can save results using the built-in CSV module. However, the tricky part is that sometimes the keys may be empty, and some data points, like the URLs and prices, will be wrapped with additional square brackets. You can clean the parsed data by filtering out empty keys and joining the elements from a list into a string:
# Create header names from the keys of the 'data' list.
# Add a loop to filter out empty keys.
fieldnames = [key for key in data[0].keys() if key]
with open(f'search_{keyword.replace(" ", "-")}.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for item in data:
# Remove the square brackets if the value is in a list.
cleaned_item = {key: ', '.join(map(str, value)) if isinstance(value, list) else value for key, value in item.items()}
# Filter out empty keys.
filtered_item = {key: value for key, value in cleaned_item.items() if key}
writer.writerow(filtered_item)
### scrape_search_pages.py
import requests, json, csv
# Replace with your API username and password.
API_credentials = ('USERNAME', 'PASSWORD')
# Enter the search keyword and the number of pages you want to scrape.
keyword = 'desktop cpu'
number_of_pages = 3
# Define your scraping and parsing parameters.
payload = {
'source': 'universal',
'url': None,
'geo_location': 'United States',
'locale': 'en-us',
'user_agent_type': 'desktop',
'render': 'html',
'browser_instructions': [
{
'type': 'scroll',
'x': 0,
'y': 3000,
'wait_time_s': 2
}
] * 3,
'parse': True,
'parsing_instructions': {
'products': {
'_fns': [
{
'_fn': 'xpath',
'_args': ['//div[@id="card-list"]/div']
}
],
'_items': {
'Title': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['.//h3/text()']
}
]
},
'Price current': {
'_fns': [
{
'_fn': 'xpath',
'_args': ['.//div[contains(@class, "price-sale")]']
}
],
'_items': {
'_fns': [
{'_fn': 'xpath', '_args': ['.//span/text()']},
{'_fn': 'join', '_args': ''}
]
}
},
'Price original': {
'_fns': [
{
'_fn': 'xpath',
'_args': ['.//div[contains(@class, "price-original")]/span/text()']
}
]
},
'URL': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['.//a/@href']
},
{
'_fn': 'regex_find_all',
'_args': [r'^\/\/(.*?)(?=\?)']
}
]
}
}
}
}
}
# Form the URLs for each number of pages you want to scrape.
page = f'https://www.aliexpress.us/w/wholesale-{keyword.replace(" ", "-")}.html'
urls = [f'{page}?page={page_num}' for page_num in range(1, number_of_pages + 1)]
# Send a request to the API.
data = []
for url in urls:
payload['url'] = url
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=API_credentials,
json=payload
)
data.extend(response.json()['results'][0]['content']['products'])
# Create header names from the keys of the 'data' list.
# Add a loop to filter out empty keys.
fieldnames = [key for key in data[0].keys() if key]
# Save the parsed search results to a CSV file.
with open(f'search_{keyword.replace(" ", "-")}.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for item in data:
# Remove the square brackets if the value is in a list.
cleaned_item = {key: ', '.join(map(str, value)) if isinstance(value, list) else value for key, value in item.items()}
# Filter out empty keys.
filtered_item = {key: value for key, value in cleaned_item.items() if key}
writer.writerow(filtered_item)
While scraping AliExpress search result pages may suffice, as mentioned previously, product pages boast even more product data that can be significant for analysis. For example, you can gather all images, product variations, related items, product specifications, reviews, and much more.
As you did earlier, start by importing the requests and CSV libraries:
import requests, csv
Then, assign your AliExpress Scraper API username and password to a variable:
API_credentials = ('USERNAME', 'PASSWORD')
Next, save your AliExpress product URLs in a list:
products = [
'https://www.aliexpress.us/item/3256806291837346.html',
'https://www.aliexpress.us/item/2251832704771713.html',
'https://www.aliexpress.us/item/3256805974680622.html'
]
Instead of the above URLs, you can also pass the product URLs you’ve scraped earlier from AliExpress search pages.
The beginning of the payload for scraping products doesn’t differ much from the search results scraper. The only differences are that you don’t need to scroll the page, and you want to instruct the API to click the “View more” button to load all of the product specifications:
Here’s what the payload should look like so far:
payload = {
'source': 'universal',
'url': None,
'geo_location': 'United States',
'locale': 'en-us',
'user_agent_type': 'desktop',
'render': 'html',
'browser_instructions': [
{
'type': 'click',
'selector': {
'type': 'xpath',
'value': '//div[@data-pl="product-specs"]//button'
}
}
],
'parse': True,
'parsing_instructions': {}
}
Now, you can start adding parsing commands inside the parsing_instructions parameter.
Title selector:
You can extract the title with a simple XPath expression:
'parsing_instructions': {
'Title': {
'_fns': [{
'_fn': 'xpath_one',
'_args': ['//h1[@data-pl="product-title"]/text()']
}]
},
Current price selector:
The current price is split between multiple span elements, as you can see from the above screenshot. Here, you can use the _items iterator to process all the span elements and then use the join function to merge all the scraped text into a single string:
'Price current': {
'_fns': [{
'_fn': 'xpath',
'_args': ['//div[contains(@class, "product-price-current")]']
}],
'_items': {
'_fns': [
{'_fn': 'xpath', '_args': ['.//span/text()']},
{'_fn': 'join', '_args': ''}
]
}
},
Original price selector:
Thankfully, the original price is already a single string of text, so you can extract it with the following instruction:
'Price original': {
'_fns': [{
'_fn': 'xpath_one',
'_args': ['//span[contains(@class, "price--original")]/text()']
}]
},
Discount selector:
In the above screenshot, you can also see the span element that holds the discount amount. Since it’s nested similarly to the original price element, you can use a slightly adjusted selector:
'Discount': {
'_fns': [{
'_fn': 'xpath_one',
'_args': ['//span[contains(@class, "price--discount")]/text()']
}]
},
Sold amount selector:
The sold product amount is inside the span element that’s inside the div element. You can form your XPath selector by searching for span elements that contain the text “sold”. Then, you can simply extract the amount by using the amount_from_string function:
'Sold': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['//div[@data-pl="product-reviewer"]//span[contains(text(), "sold")]/text()']
},
{'_fn': 'amount_from_string'}
]
},
Rating number selector:
You can get the product rating number from the span element that’s inside the div element with the data-pl="product-reviewer" attribute:
'Rating': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['//div[@data-pl="product-reviewer"]//strong/text()']
},
{'_fn': 'amount_from_string'}
]
},
Reviews count selector:
The number of product reviews can be extracted from the a element:
'Reviews count': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['//a[@href="#nav-review"]/text()']
},
{'_fn': 'amount_from_string'}
]
},
Delivery selector:
You can easily get the delivery information by utilizing the contains() function of XPath:
'Delivery': {
'_fns': [{
'_fn': 'xpath_one',
'_args': ['//div[contains(@class, "dynamic-shipping")]//strong/text()']
}]
},
Specifications selectors:
Lastly, the product specifications table requires the _items iterator to fetch each field. Once again, the XPaths’ contains() function helps a lot here to simplify selector expressions:
'Specifications': {
'_fns': [{
'_fn': 'xpath',
'_args': ['//ul[contains(@class, "specification--list")]//li/div']
}],
'_items': {
'Title': {
'_fns': [{
'_fn': 'xpath_one',
'_args': ['.//div[contains(@class, "title")]//text()']
}]
},
'Description': {
'_fns': [{
'_fn': 'xpath_one',
'_args': ['.//div[contains(@class, "desc")]//text()']
}]
}
}
}
}
}
The next step is to send a request to the API for each product URL in the products list:
data = []
for url in products:
payload['url'] = url
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=API_credentials,
json=payload
)
result = response.json()['results'][0]['content']
# Clean up scraped specifications.
specifications = []
for spec in result['Specifications']:
string = f'{spec["Title"]}: {spec["Description"]}'
specifications.append(string)
result['Specifications'] = ';\n '.join(specifications)
result['URL'] = url
data.append(result)
As you can see, the URL is appended to the payload dynamically. The code then cleans up the product specifications returned by the API, removing unnecessary placeholder keys and improving readability by formatting specifications as 'Specification title': 'Specification description'.
Finally, you can use the CSV module to save all the gathered AliExpress product data to an easily readable CSV file. Note that you should also filter out empty keys and remove the square brackets from values:
# Create header names from the keys of the 'data' list.
fieldnames = [key for key in data[0].keys() if key]
# Save the parsed products to a CSV file.
with open(f'products.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for item in data:
# Remove the square brackets if the value is in a list.
cleaned_item = {key: ', '.join(map(str, value)) if isinstance(value, list) else value for key, value in item.items()}
writer.writerow(cleaned_item)
### scrape_products.py
import requests, csv
# Replace with your API username and password.
API_credentials = ('USERNAME', 'PASSWORD')
# Store your AliExpress product pages in a list.
products = [
'https://www.aliexpress.us/item/3256806291837346.html',
'https://www.aliexpress.us/item/2251832704771713.html',
'https://www.aliexpress.us/item/3256805974680622.html'
]
# Define your scraping and parsing parameters.
payload = {
'source': 'universal',
'url': None,
'geo_location': 'United States',
'locale': 'en-us',
'user_agent_type': 'desktop',
'render': 'html',
'browser_instructions': [
{
'type': 'click',
'selector': {
'type': 'xpath',
'value': '//div[@data-pl="product-specs"]//button'
}
}
],
'parse': True,
'parsing_instructions': {
'Title': {
'_fns': [{
'_fn': 'xpath_one',
'_args': ['//h1[@data-pl="product-title"]/text()']
}]
},
'Price current': {
'_fns': [{
'_fn': 'xpath',
'_args': ['//div[contains(@class, "product-price-current")]']
}],
'_items': {
'_fns': [
{'_fn': 'xpath', '_args': ['.//span/text()']},
{'_fn': 'join', '_args': ''}
]
}
},
'Price original': {
'_fns': [{
'_fn': 'xpath_one',
'_args': ['//span[contains(@class, "price--original")]/text()']
}]
},
'Discount': {
'_fns': [{
'_fn': 'xpath_one',
'_args': ['//span[contains(@class, "price--discount")]/text()']
}]
},
'Sold': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['//div[@data-pl="product-reviewer"]//span[contains(text(), "sold")]/text()']
},
{'_fn': 'amount_from_string'}
]
},
'Rating': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['//div[@data-pl="product-reviewer"]//strong/text()']
},
{'_fn': 'amount_from_string'}
]
},
'Reviews count': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['//a[@href="#nav-review"]/text()']
},
{'_fn': 'amount_from_string'}
]
},
'Delivery': {
'_fns': [{
'_fn': 'xpath_one',
'_args': ['//div[contains(@class, "dynamic-shipping")]//strong/text()']
}]
},
'Specifications': {
'_fns': [{
'_fn': 'xpath',
'_args': ['//ul[contains(@class, "specification--list")]//li/div']
}],
'_items': {
'Title': {
'_fns': [{
'_fn': 'xpath_one',
'_args': ['.//div[contains(@class, "title")]//text()']
}]
},
'Description': {
'_fns': [{
'_fn': 'xpath_one',
'_args': ['.//div[contains(@class, "desc")]//text()']
}]
}
}
}
}
}
# Send a request to the API.
data = []
for url in products:
payload['url'] = url
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=API_credentials,
json=payload
)
result = response.json()['results'][0]['content']
# Clean up scraped specifications.
specifications = []
for spec in result['Specifications']:
string = f'{spec["Title"]}: {spec["Description"]}'
specifications.append(string)
result['Specifications'] = ';\n '.join(specifications)
result['URL'] = url
data.append(result)
# Create header names from the keys of the 'data' list.
fieldnames = [key for key in data[0].keys() if key]
# Save the parsed products to a CSV file.
with open(f'products.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for item in data:
# Remove the square brackets if the value is in a list.
cleaned_item = {key: ', '.join(map(str, value)) if isinstance(value, list) else value for key, value in item.items()}
writer.writerow(cleaned_item)
In this section, you’ll learn how to gather all product reviews from any AliExpress product page. For this, you don’t need to scrape each data point from the reviews section. Instead, you can use the https://feedback.aliexpress.com/ resource that holds a certain amount of reviews in JSON format, making the entire scraping process much easier.
As usual, begin by importing the required Python libraries. This time, you’ll need the regular expressions module as well:
import requests, re, json, csv
Next, store your API username and password as a variable:
API_credentials = ('USERNAME', 'PASSWORD')
As mentioned earlier, you can directly scrape the JSON resource that contains product reviews, so let’s see how to access the resource in question.
Visit this AliExpress product page, then open your browser’s Developer Tools by pressing the following keyboard keys:
F12 or Control + Shift + I on Windows
Command + Option + I on macOS
Then, head to the Network tab and filter for Fetch/XHR resources. The target resource shouldn’t be visible yet, so trigger a request to it by scrolling to the reviews section on the page and pressing the “View more” button to open a new window with reviews. Now, you should be able to see a resource that starts with searchEvaluation.do?:
In the Response tab, you can see the JSON data of the product reviews. Now head to the Headers tab of this resource and copy the Request URL:
You can use this URL to access any amount of product reviews by adjusting the productId={product_id} and pageSize={max_reviews} parameters. For example, to access a maximum of 100 reviews for the product with ID 3256805974680622, you can form the URL like this:
https://feedback.aliexpress.com/pc/searchEvaluation.do?productId=3256805974680622&lang=en_US&country=US&pageSize=100&filter=all&sort=complex_default
This is where the regular expressions library comes in handy – you can automatically extract the product ID from the product URL and then pass it over to the reviews URL:
url = 'https://www.aliexpress.us/item/3256805974680622.html'
max_reviews = 100
product_id = re.match(r'.*/(\d+)\.html$', url).group(1)
reviews_url = f'https://feedback.aliexpress.com/pc/searchEvaluation.do?productId={product_id}&lang=en_US&country=US&pageSize={max_reviews}&filter=all&sort=complex_default'
Now, form a very simple payload and pass the reviews_url variable to the payload:
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=API_credentials,
json={
'source': 'universal',
'url': reviews_url, # Pass the processed reviews URL.
'geo_location': 'United States',
'user_agent_type': 'desktop'
}
)
results = response.json()['results'][0]['content']
data = json.loads(results)
After loading the results with a JSON module, you can parse each review to extract only the data that’s useful to you and then append it to the parsed_reviews list:
parsed_reviews = []
for review in data['data']['evaViewList']:
parsed_review = {
'Rating': review.get('buyerEval', ''),
'Date': review.get('evalDate', ''),
'Feedback_translated': review.get('buyerTranslationFeedback', ''),
'Feedback': review.get('buyerFeedback', ''),
review.get('reviewLabel1', ''): review.get('reviewLabelValue1', ''),
review.get('reviewLabel2', ''): review.get('reviewLabelValue2', ''),
review.get('reviewLabel3', ''): review.get('reviewLabelValue3', ''),
'Name': review.get('buyerName', ''),
'Country': review.get('buyerCountry', ''),
'Upvotes': review.get('upVoteCount', ''),
'Downvotes': review.get('downVoteCount', '')
}
parsed_reviews.append(parsed_review)
Next, save the parsed AliExpress product reviews to a CSV file and make sure to filter out empty keys:
# Create header names from the keys of parsed_reviews.
# Add a loop to filter out empty keys.
fieldnames = [key for key in parsed_reviews[0].keys() if key]
# Save the parsed reviews to a CSV file.
with open('reviews.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for item in parsed_reviews:
# Filter out empty keys.
filtered_item = {key: value for key, value in item.items() if key}
writer.writerow(filtered_item)
### scrape_reviews.py
import requests, re, json, csv
# Use your API username and password.
API_credentials = ('USERNAME', 'PASSWORD')
url = 'https://www.aliexpress.us/item/3256805974680622.html'
# Specify the maximum number of reviews to extract.
max_reviews = 100
# Get the product ID from the URL
product_id = re.match(r'.*/(\d+)\.html$', url).group(1)
reviews_url = f'https://feedback.aliexpress.com/pc/searchEvaluation.do?productId={product_id}&lang=en_US&country=US&pageSize={max_reviews}&filter=all&sort=complex_default'
# Send a request to the API.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=API_credentials,
json={
'source': 'universal',
'url': reviews_url, # Pass the processed reviews URL.
'geo_location': 'United States',
'user_agent_type': 'desktop'
}
)
results = response.json()['results'][0]['content']
data = json.loads(results)
# Parse each review and append the results to a list.
parsed_reviews = []
for review in data['data']['evaViewList']:
parsed_review = {
'Rating': review.get('buyerEval', ''),
'Date': review.get('evalDate', ''),
'Feedback_translated': review.get('buyerTranslationFeedback', ''),
'Feedback': review.get('buyerFeedback', ''),
review.get('reviewLabel1', ''): review.get('reviewLabelValue1', ''),
review.get('reviewLabel2', ''): review.get('reviewLabelValue2', ''),
review.get('reviewLabel3', ''): review.get('reviewLabelValue3', ''),
'Name': review.get('buyerName', ''),
'Country': review.get('buyerCountry', ''),
'Upvotes': review.get('upVoteCount', ''),
'Downvotes': review.get('downVoteCount', '')
}
parsed_reviews.append(parsed_review)
# Create header names from the keys of parsed_reviews.
# Add a loop to filter out empty keys.
fieldnames = [key for key in parsed_reviews[0].keys() if key]
# Save the parsed reviews to a CSV file.
with open('reviews.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for item in parsed_reviews:
# Filter out empty keys.
filtered_item = {key: value for key, value in item.items() if key}
writer.writerow(filtered_item)
Scraping AliExpress top-selling product pages is quite similar to scraping search pages, yet the major difference is the infinite scroll feature. Hence, instead of forming the URLs for each page, you’ll need to scroll the page by a specific number of pixels to load a desired number of product listings. Let’s begin by importing the libraries:
import requests, csv
API_credentials = ('USERNAME', 'PASSWORD')
Next, specify your target URL in the payload; for example, let’s use the top selling Phones & Telecommunications URL. The page requires users to scroll about 2400 pixels in order for new items to be loaded. You can easily simulate this action by adding the browser_instructions parameter with the scroll instruction and multiplying this action 19 times, allowing you to load 300 products in total:
payload = {
'source': 'universal',
'url': 'https://www.aliexpress.com/p/calp-plus/index.html?&categoryTab=us_phones_%2526_telecommunications',
'geo_location': 'United States',
'locale': 'en-us',
'user_agent_type': 'desktop',
'render': 'html',
'browser_instructions': [
{
"type": "scroll",
"x": 0,
"y": 2400,
"wait_time_s": 2
}
] * 19,
'parse': True,
'parsing_instructions': {}
}
Feel free to adjust the number of times the API scrolls the page. You may also want to adjust the amount of pixels according to your results.
Title selector:
The parsing_instructions for the top-selling page are quite similar to the search page; you only have to replace several XPath selectors. Start by specifying the product card selector and then use the _items iterator to parse each product. The product title can be extracted by targeting the h3 tag:
'parsing_instructions': {
'products': {
'_fns': [
{
'_fn': 'xpath',
'_args': ['//div[@data-spm="prodcutlist"]/div']
}
],
'_items': {
'Title': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['.//h3/text()']
}
]
},
Current price selector:
Once again, the current product price is split into several span elements, so you can use the _items iterator to easily retrieve and then join all the elements into a single string:
'Price current': {
'_fns': [
{
'_fn': 'xpath',
'_args': ['.//div[@class="U-S0j"]']
}
],
'_items': {
'_fns': [
{'_fn': 'xpath', '_args': ['.//span/text()']},
{'_fn': 'join', '_args': ''}
]
}
},
Original price selector:
You can grab the original price from the span element that’s nested inside a div with a class _1zEQq:
'Price original': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['.//div[@class="_1zEQq"]/span/text()']
}
]
},
Amount of sales selector:
Then, you can parse the amount of sold products as shown below:
'Sales amount': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['.//span[@class="Ktbl2"]/text()']
}
]
},
URL selector:
Next, you can grab the product URL from an a element with an href attribute and use the regex_find_all function to clean up the URL:
'URL': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['.//a/@href']
},
{
'_fn': 'regex_find_all',
'_args': [r'^\/\/(.*?)(?=\?)']
}
]
}
}
}
}
}
Afterward, make a request to the API and add the results to the data list:
data = []
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=API_credentials,
json=payload,
)
data.extend(response.json()['results'][0]['content']['products'])
Finally, save all the scraped and parsed product data to a new CSV file:
fieldnames = [key for key in data[0].keys()]
with open(f'top_selling.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for item in data:
cleaned_item = {key: ', '.join(map(str, value)) if isinstance(value, list) else value for key, value in item.items()}
writer.writerow(cleaned_item)
### scrape_top_selling.py
import requests, csv
# Replace with your API username and password.
API_credentials = ('USERNAME', 'PASSWORD')
# Define your scraping and parsing parameters.
payload = {
'source': 'universal',
'url': 'https://www.aliexpress.com/p/calp-plus/index.html?&categoryTab=us_phones_%2526_telecommunications',
'geo_location': 'United States',
'locale': 'en-us',
'user_agent_type': 'desktop',
'render': 'html',
'browser_instructions': [
{
"type": "scroll",
"x": 0,
"y": 2400,
"wait_time_s": 2
}
] * 19,
'parse': True,
'parsing_instructions': {
'products': {
'_fns': [
{
'_fn': 'xpath',
'_args': ['//div[@data-spm="prodcutlist"]/div']
}
],
'_items': {
'Title': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['.//h3/text()']
}
]
},
'Price current': {
'_fns': [
{
'_fn': 'xpath',
'_args': ['.//div[@class="U-S0j"]']
}
],
'_items': {
'_fns': [
{'_fn': 'xpath', '_args': ['.//span/text()']},
{'_fn': 'join', '_args': ''}
]
}
},
'Price original': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['.//div[@class="_1zEQq"]/span/text()']
}
]
},
'Sales amount': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['.//span[@class="Ktbl2"]/text()']
}
]
},
'URL': {
'_fns': [
{
'_fn': 'xpath_one',
'_args': ['.//a/@href']
},
{
'_fn': 'regex_find_all',
'_args': [r'^\/\/(.*?)(?=\?)']
}
]
}
}
}
}
}
# Send a request to the API.
data = []
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=API_credentials,
json=payload,
)
data.extend(response.json()['results'][0]['content']['products'])
# Create header names from the keys of the 'data' list.
fieldnames = [key for key in data[0].keys()]
# Save the parsed reviews to a CSV file.
with open(f'top_selling.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for item in data:
# Remove the square brackets if the value is in a list.
cleaned_item = {key: ', '.join(map(str, value)) if isinstance(value, list) else value for key, value in item.items()}
writer.writerow(cleaned_item)
Hopefully, this guide got you on the right track to scraping AliExpress product data successfully. While the shown code examples scrape large amounts of public information, you may want to further improve each scraper by utilizing the asynchronous web scraping method. In case you want to build a price monitoring tool, check out our tutorial on how to build a price tracker with Python.
When building your own tools, proxies are essential for block-free web scraping. To resemble organic traffic, you can buy proxy solutions, most notably residential and datacenter IPs.
Don’t hesitate to check out Oxylabs’ Web Scraper API documentation for more information about integration methods and web scraping parameters. If you have further questions or need assistance, feel free to contact our 24/7 support via live chat or email.
Information on AliExpress is considered publicly available, so you should be able to collect it. To learn more about the legality of web scraping, please see our article here or contact a professional about your specific case. Also, check our AliExpress Scraper on GitHub.
The process of web scraping aliexpress.com is similar to gathering public data from any other website. First of all, we recommend using the Python programming language for an easier start with web scraping. Then, you need to import the necessary libraries, send a request to the AliExpress website, and parse the data to make it ready for analysis. Since web scraping almost always requires the ability to bypass blocks like CAPTCHAs and IP bans, you may want to consider using proxy servers or acquiring a ready-to-use tool like AliExpress Scraper API.
About the author
Vytenis Kaubrė
Technical Copywriter
Vytenis Kaubrė is a Technical Copywriter at Oxylabs. His love for creative writing and a growing interest in technology fuels his daily work, where he crafts technical content and web scrapers with Oxylabs’ solutions. Off duty, you might catch him working on personal projects, coding with Python, or jamming on his electric guitar.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®