Back to blog
Python Cache: How to Speed Up Your Code with Effective Caching
Vytenis Kaubrė
Back to blog
Vytenis Kaubrė
A smooth and seamless user experience is a critical factor for the success of any user-facing application. Developers often strive to minimize application latencies to improve the user experience. Usually, the root cause of these latencies is data access delays.
Developers can significantly reduce the delays by caching the data, which leads to faster load times and happier users. Web scraping is no exception – large-scale projects can also be considerably sped up.
But what is caching exactly, and how can it be implemented? This article will discuss caching, its purpose and uses, and how to use it to supercharge your web scraping code in Python.
Caching is a mechanism for improving the performance of any application. In a technical sense, caching is storing the data in a cache and retrieving it later. But wait, what exactly is a cache?
A cache is a fast storage space (usually temporary) where frequently accessed data is kept to speed up the system’s performance and decrease the access times. For example, a computer's cache is a small but fast memory chip (usually an SRAM) between the CPU and the main memory chip (usually a DRAM).
The CPU first checks the cache when it needs to access the data. If it’s in the cache, a cache hit occurs, and the data is thereby read from the cache instead of a relatively slower main memory. It results in reduced access times and enhanced performance.
Caching can improve the performance of applications and systems in several ways. Here are the primary reasons to use caching:
A cache's main objective is to speed up access to frequently used data. Caching accomplishes this by keeping frequently used data in a temporary storage area that’s easier to access than the original data source. Caching can significantly improve an application's or system's overall performance by decreasing access time.
Caching can also reduce the load on the system. This is achieved by reducing the number of requests made to the external data source (e.g., a database).
Cache stores the frequently used data in the cache storage, allowing the applications to access the data from the cache rather than repeatedly requesting the data source. This reduces the external data source load and eventually improves the system's performance.
Caching allows users to get data more rapidly, supporting more natural interaction with the system or application. This is particularly important with real-time systems and web applications since users expect immediate responses. Thereby, caching can help improve the overall user experience of an application or a system.
Caching is a general concept and has several prominent use cases. You can apply it in any scenario where data access has some patterns and you can predict what data will be demanded next. You can prefetch the demanded data in the cache store and improve application performance.
We frequently need to access data from databases or external APIs in web applications. Caching can reduce the access time for databases or API requests and increase the performance of a web application.
In a web application, caching is used on both client and the server side. On the client side, the web application stores static resources (e.g., images) and user preferences (e.g., theme settings) in the client’s browser cache.
On the server side, in-memory caching (storing data in the system’s memory) between the database and the web servers can significantly reduce request load to the database, resulting in faster load times.
For example, shopping websites display a list of products, and users move back and forth on multiple products. To prevent repeated access to the database to get the product list, each time a visitor opens the page, we can cache the list of products using in-memory caching.
Machine learning applications frequently require massive datasets. Prefetching subsets of the dataset in the cache will decrease the data access time and, eventually, reduce the training time of the model.
After completing the training, the machine learning models often learn weight vectors. You can cache these weight vectors and then quickly access them to predict any new unseen samples, which is the most frequent operation.
Central processing units use dedicated caches (e.g., L1, L2, and L3) to improve their operations. CPU prefetches the data based on spatial and temporal access patterns and thus saves several CPU cycles otherwise wasted on reading from the main memory (RAM).
Different caching strategies can be devised based on specific spatial or temporal data access patterns.
Spatial caching is less popular in user-facing applications and more common in scientific or technical applications that deal with vast volumes of data and require high performance and computational power. It involves prefetching data that's spatially close (in the memory) to the data the application or system is currently processing. This idea exploits the fact that it’s more likely that a program or an application user will next demand the data units that are spatially close to the current demand. Thereby, prefetching can save time.
Temporal caching, a more common strategy, involves retaining data in the cache based on its frequency of use. Here are the most popular temporal caching strategies:
The FIFO caching approach operates on the principle that the first item added to the cache will be the first one to be removed. This method involves loading a predetermined number of items into the cache, and when the cache reaches total capacity, the oldest item is removed to accommodate a new one.
This approach is ideal for systems prioritizing the order of access, such as message processing or queue management systems.
It works by replacing the last item added to the cache first. The cache is loaded with a set number of items at the beginning. As new items are added, the most recently added item is the first one to be removed to make space for the newest one, with the oldest item in the cache remaining until it’s removed to make space.
This method is suitable for applications prioritizing recent data over older data, such as stack-based data structures or real-time streaming.
LRU caching involves storing frequently used items while removing the least-recently-used ones to free up space. This method is particularly useful in scenarios with web applications or database systems, where more importance is placed on frequently accessed data than older data.
To understand this concept, let's imagine that we’re hosting a movie website, and we need to cache movie information. Assume we have a cache in size of four units, and information for each movie takes one such unit. Thereby, we can cache information for at most four movies only.
Now, suppose we have the following list of movies to host:
Movie A
Movie B
Movie C
Movie D
Movie E
Movie F
Let’s say that the cache first populated with these four movies along with their request time:
(1:50) Movie B
(1:43) Movie C
(1:30) Movie A
(1:59) Movie F
Our cache is now full. If there's a request for a new movie (say Movie D at 02:30), we must remove any of the movies and add a new one. In LRU caching strategy, the movie that wasn’t recently watched will be removed first. This means that Movie A will be replaced by Movie D with a new timestamp:
(1:50) Movie B
(1:43) Movie C
(2:30) Movie D
(1:59) Movie F
MRU caching eliminates items based on their most recent use. This differs from LRU caching, which removes the least recently used items first.
To illustrate, in our movie hosting example, the movie with the newest time will be replaced. Let's reset the cache to the time it initially became full:
(1:50) Movie B
(1:43) Movie C
(1:30) Movie A
(1:59) Movie F
Our cache is now full. If there’s a request for a new movie (Movie D), we must remove any of the movies and add a new one. Under MRU’s strategy, the movie with the latest time will be replaced with the new movie. This means Movie D at 2:30 will replace Movie F:
(1:50) Movie B
(1:43) Movie C
(1:30) Movie A
(2:30) Movie D
MRU is helpful in situations where the longer something goes without being used, the more probable it will be used again later.
The Least Frequently Used (LFU) policy removes the cache item used the least number of times since it was first added. LFU, unlike LRU and MRU, doesn’t need to store the access times. It simply keeps track of the number of times an item has been accessed since its addition.
Let's use the movie example again. This time we have maintained a count of how often a movie is watched:
(2) Movie B
(2) Movie C
(1) Movie A
(3) Movie F
Our cache is now full. If there’s a request for a new movie (Movie D), we must remove any of the movies and add a new one. LFU replaces the movie with the lowest watch count with a new movie:
(2) Movie B
(2) Movie C
(1) Movie D
(3) Movie F
There are different ways to implement caching in Python for different caching strategies. Here we’ll see two methods of Python caching for a simple web scraping example. If you’re new to web scraping, take a look at our step-by-step Python web scraping guide.
We’ll use the requests library to make HTTP requests to a website. Install it with pip by entering the following command in your terminal:
python -m pip install requests
Other libraries we’ll use in this project, specifically time and functools, come natively with Python 3.11.2, so you don’t have to install them.
A decorator in Python is a function that accepts another function as an argument and outputs a new function. We can alter the behavior of the original function using a decorator without changing its source code.
One common use case for decorators is to implement caching. This involves creating a dictionary to store the function's results and then saving them in the cache for future use.
Let’s start by creating a simple function that takes a URL as a function argument, requests that URL, and returns the response text:
import requests
def get_html_data(url):
response = requests.get(url)
return response.text
Link to GitHubNow, let's move toward creating a memoized version of this function:
def memoize(func):
cache = {}
def wrapper(*args):
if args in cache:
return cache[args]
else:
result = func(*args)
cache[args] = result
return result
return wrapper
@memoize
def get_html_data_cached(url):
response = requests.get(url)
return response.text
Link to GitHubThe wrapper function determines whether the current input arguments have been previously cached and, if so, returns the previously cached result. If not, the code calls the original function and caches the result before being returned. In this case, we define a memoize decorator that generates a cache dictionary to hold the results of previous function calls.
By adding @memoize above the function definition, we can use the memoize decorator to enhance the get_html_data function. This generates a new memoized function that we’ve called get_html_data_cached. It only makes a single network request for a URL and then stores the response in the cache for further requests.
Let’s use the time module to compare the execution speeds of the get_html_data function and the memoized get_html_data_cached function:
import time
start_time = time.time()
get_html_data('https://books.toscrape.com/')
print('Time taken (normal function):', time.time() - start_time)
start_time = time.time()
get_html_data_cached('https://books.toscrape.com/')
print('Time taken (memoized function using manual decorator):', time.time() - start_time)
Link to GitHubHere’s what the complete code looks like:
# Import the required modules
import time
import requests
# Function to get the HTML Content
def get_html_data(url):
response = requests.get(url)
return response.text
# Memoize function to cache the data
def memoize(func):
cache = {}
# Inner wrapper function to store the data in the cache
def wrapper(*args):
if args in cache:
return cache[args]
else:
result = func(*args)
cache[args] = result
return result
return wrapper
# Memoized function to get the HTML Content
@memoize
def get_html_data_cached(url):
response = requests.get(url)
return response.text
# Get the time it took for a normal function
start_time = time.time()
get_html_data('https://books.toscrape.com/')
print('Time taken (normal function):', time.time() - start_time)
# Get the time it took for a memoized function (manual decorator)
start_time = time.time()
get_html_data_cached('https://books.toscrape.com/')
print('Time taken (memoized function using manual decorator):', time.time() - start_time)
Link to GitHubAnd here’s the output:
Notice the time difference between the two functions. Both take almost the same time, but the supremacy of caching lies behind the re-access.
Since we’re making only one request, the memoized function also has to access data from the main memory. Therefore, with our example, a significant time difference in execution isn’t expected. However, if you increase the number of calls to these functions, the time difference will significantly increase (see below section called Performance comparison).
Another method to implement caching in Python is to use the built-in @lru_cache decorator from functools. This decorator implements cache using the least recently used (LRU) caching strategy. This LRU cache is a fixed-size cache, which means it’ll discard the data from the cache that hasn’t been used recently.
To use the @lru_cache decorator, we can create a new function for extracting HTML content and place the decorator name at the top. Make sure to import the functools module before using the decorator:
from functools import lru_cache
@lru_cache(maxsize=None)
def get_html_data_lru(url):
response = requests.get(url)
return response.text
Link to GitHubIn the above example, the get_html_data_lru method is memoized using the @lru_cache decorator. The cache can grow indefinitely when the maxsize option is set to None.
To use the @lru_cache decorator, just add it above the get_html_data_lru function. Here’s the complete code sample:
# Import the required modules
from functools import lru_cache
import time
import requests
# Function to get the HTML Content
def get_html_data(url):
response = requests.get(url)
return response.text
# Memoized using LRU Cache
@lru_cache(maxsize=None)
def get_html_data_lru(url):
response = requests.get(url)
return response.text
# Get the time it took for a normal function
start_time = time.time()
get_html_data('https://books.toscrape.com/')
print('Time taken (normal function):', time.time() - start_time)
# Get the time it took for a memoized function (LRU cache)
start_time = time.time()
get_html_data_lru('https://books.toscrape.com/')
print('Time taken (memoized function with LRU cache):', time.time() - start_time)
Link to GitHubThis produced the following output:
In the following table, we’ve determined the execution times of all three functions for different numbers of requests to these functions:
No. of requests | Time taken by normal function | Time taken by memoized function (manual decorator) | Time taken by memoized function (lru_cache decorator) |
---|---|---|---|
1 | 2.1 Seconds | 2.0 Seconds | 1.7 Seconds |
10 | 17.3 Seconds | 2.1 Seconds | 1.8 Seconds |
20 | 32.2 Seconds | 2.2 Seconds | 2.1 Seconds |
30 | 57.3 Seconds | 2.22 Seconds | 2.12 Seconds |
As the number of requests to the functions increases, you can see a significant reduction in execution times using the caching strategy. The following comparison chart depicts these results:
The comparison results clearly show that using a caching strategy in your code can significantly improve overall performance and speed.
If you want to speed up your code, caching the data can help. There are many use cases where caching is the game changer and web scraping is one of them. When dealing with large-scale projects, consider caching frequently used data in your code to massively speed up your data extraction efforts and improve overall performance.
In case you have questions about our products, feel free to contact our 24/7 support via live chat or email.
Caching and data replication are two different methods used to improve the performance and availability of data.
Caching involves storing frequently used data in a nearby cache for quick access. Whenever a user or application requests data, it’s first checked in the cache, and if it's available, it's returned from there instead of the primary data source. This reduces the load on the main data source and speeds up data retrieval.
On the other hand, data replication involves creating multiple copies of data and storing them in various servers or locations. Rather than relying on a single primary data source, the system can retrieve data from any available copies when a user or application requests. This ensures that data remains accessible even if one server or location goes offline, enhancing data availability and fault tolerance.
Memoization and caching serve different goals. Caching improves data access and retrieval, especially when the same data is frequently accessed, or there’s a high degree of locality in the data access pattern.
Caching optimizes data access and retrieval by keeping frequently accessed data in a local cache. On the other hand, Memoization improves the performance of recursive functions or functions that perform demanding calculations.
About the author
Vytenis Kaubrė
Technical Copywriter
Vytenis Kaubrė is a Technical Copywriter at Oxylabs. His love for creative writing and a growing interest in technology fuels his daily work, where he crafts technical content and web scrapers with Oxylabs’ solutions. Off duty, you might catch him working on personal projects, coding with Python, or jamming on his electric guitar.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®