Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network statusCareers

Back to blog

Reading & Parsing JSON Data With Python: Tutorial

Monika Maslauskaite

Monika Maslauskaite

2021-08-308 min read
Share

JSON is a common standard used by websites and APIs, and even natively supported by modern databases such as PostgreSQL. In this article, we’ll present a tutorial on how to handle JSON data with Python. Now, let’s start with the definition of JSON. 

What is JSON?

JSON, or JavaScript Object Notation, is a format that uses text to store data objects. In other words, it is data structures for representing objects as text. Even though it’s derived from JavaScript, it has become a de facto standard of transferring objects. 

This format is supported by most popular programming languages, including Python. Most commonly, JSON is used to transfer data objects by APIs. Here is an example of a JSON string:

{
   "name": "United States",
   "population": 331002651,
   "capital": "Washington D.C.",
   "languages": [
      "English",
      "Spanish"
   ]
}
Link to GitHub

In this example, JSON data looks like a Python dictionary. Just like dictionaries, JSON contains data in key-value pairs. However, the JSON data can also be a string, a number, a boolean, or a list.

Worth noting that JSON data can also be parsed in numerous ways. One of them is to parse nested JSON. Doing so allows you to manipulate data elements that are otherwise nested within complex data structures in JSON. Luckily, Python has a built-in json module; therefore, parsing nested structures becomes a significantly more manageable task.

Before JSON became popular, XML had been the common choice to represent data objects in a text format. Here is an example of the same information in the XML format:

<?xml version="1.0" encoding="UTF-8"?>
<country>
   <name>United States</name>
   <population>331002651</population>
   <capital>Washington D.C.</capital>
   <languages>
       <language>English</language>
       <language>Spanish</language>
   </languages>
</country>

As evident here, JSON is lightweight. This is one of the primary reasons why JSON is so popular and why JSON syntax is such an efficient data exchange format. If you want to read more about the JSON standard, head over to the official JSON website

JSON in Python

Python supports JSON data natively. The Python json module is part of the Standard Library. The json module can handle the conversion of JSON data from JSON format to the equivalent Python objects such as dictionary and list. The JSON package can also convert Python objects into the JSON format. Crucially, JSON also allows you to write custom encoders and decoders. As a result, it becomes easy to parse JSON responses from APIs.

The json module provides the functionality to write custom encoders and decoders, and there is no separate installation needed. You can find the official documentation for the Python JSON module here.

In the remainder of this tutorial, we will explore this package. We’re going to convert JSON to dictionary and list and the other way round. We’ll also explore how to handle custom classes.

Converting JSON string to Python object

JSON data is frequently stored in strings. This is a common scenario when working with APIs. The JSON data would be stored in string variables before it can be parsed. As a result, the most common task related to JSON is to parse the JSON string into the Python dictionary. The JSON module can take care of this task effectively, thus making this specific parsing JSON in Python task rather simple.

1. Importing JSON. The first step would be importing the Python json module. This module contains two important functions – loads and load.

Note that the first method looks like a plural form, but it is not. The letter ‘S’ stands for ‘string’. 

2. Using the load function. The helpful method to parse JSON data from strings is loads. Note that it is read as ‘load-s’. The ‘s’ stands for ‘string’ here. The other method load is used when the data is in bytes. This is covered at length in a later section.

Let’s start with a simple example. The instance of JSON data is as follows:

{
   "name": "United States",
   "population": 331002651,
}
Link to GitHub

3. Storing JSON string. JSON data can be stored as JSON string before it is parsed. Even though we can use Python’s triple quotes convention to store multi-line strings, we can remove the line breaks for readability.

# JSON string
country = '{"name": "United States", "population": 331002651}'
print(type(country))
Link to GitHub

The output of this snippet will confirm that this is indeed a JSON string:

<class 'str'>
Link to GitHub

We can call the json.loads() method and provide this string as a parameter.

import json

country = '{"name": "United States", "population": 331002651}'
country_dict = json.loads(country)

print(type(country))
print(type(country_dict))
Link to GitHub

The output of this snippet will confirm that the JSON data, which was a string, is now a Python dictionary.

<class 'str'>
<class 'dict'>
Link to GitHub

This dictionary can be accessed as usual:

print(country_dict['name'])
# OUTPUT:   United States
Link to GitHub

It is important to note here that the json.loads() method will not always return a dictionary. The data type that is returned will depend on the input string. For example, this JSON string will return a list, not a dictionary.

countries = '["United States", "Canada"]'
counties_list= json.loads(countries)

print(type(counties_list))
# OUTPUT:  <class 'list'>
Link to GitHub

Similarly, if the JSON string contains true, it will be converted to Python equivalent boolean value, which is True.

import json
 
bool_string = 'true'
bool_type = json.loads(bool_string)
print(bool_type)
# OUTPUT:  True
Link to GitHub

4. The following table shows JSON objects and the Python data types after conversion. For more details, see Python docs.

JSONPython
objectdict
arraylist
stringstr
number (integer)int
number (real)float
trueTrue
falseFalse
nullNone

Now, let’s move on to the next topic on parsing a JSON object to a Python object.

Converting JSON file to Python object

Making Python read JSON files to parse JSON data into Python data is very similar to how we make Python parse the JSON data stored in strings. Apart from JSON, Python’s native open() function will also be required.

Instead of the JSON loads method, which reads JSON strings, the method used to read JSON data in files is load()

The load() method takes up a file object and returns the JSON data parsed into a Python object.

To get the file object from a file path, Python’s open() function can be used.

Save the following JSON data as a new file and name it united_states.json:

{
   "name": "United States",
   "population": 331002651,
   "capital": "Washington D.C.",
   "languages": [
      "English",
      "Spanish"
   ]
}
Link to GitHub

Enter this Python script in a new file:

import json

with open('united_states.json') as f:
  data = json.load(f)

print(type(data))
Link to GitHub

Running this Python file prints the following:

<class 'dict'>

In this example, the open function returns a file handle, which is supplied to the load method.

This variable data contains the JSON as a Python dictionary. This means that the dictionary keys can be checked as follows:

print(data.keys())
# OUTPUT:  dict_keys(['name', 'population', 'capital', 'languages'])
Link to GitHub

Using this information, the value of name can be printed as follows:

print(data['name'])
# OUTPUT:  United States

In the previous two sections, we examined how JSON can be converted to Python objects. Now, it’s time to explore how to convert Python objects to JSON.

Converting Python object to JSON string

Converting Python objects to JSON objects is also known as serialization or JSON encoding. It can be achieved by using the function dumps(). It is read as dump-s and the letter S stands for string.

Here is a simple example.  Save this code in a new file as a Python script:

import json

languages = ["English","French"]
country = {
    "name": "Canada",
    "population": 37742154,
    "languages": languages,
    "president": None,
}

country_string = json.dumps(country)
print(country_string)
Link to GitHub

When this file is run with Python, the following output is printed:

{"name": "Canada", "population": 37742154, "languages": ["English", "French"],
 "president": null}
Link to GitHub

The Python object is now a JSON object. This simple example demonstrates how easy it is to parse a Python object to a JSON object. Note that the Python object was a dictionary. That’s the reason it was converted into a JSON object type. Lists can be converted to JSON as well. Here is the Python script and its output:

import json

languages = ["English", "French"]

languages_string = json.dumps(languages)
print(languages_string)
# OUTPUT:   ["English", "French"]
Link to GitHub

It’s not just limited to a dictionary and a list. string, int, float, bool and even None value can be converted to JSON. 

Refer to the conversion table below for details. As you can see, only the dictionary is converted to json object type. For the official documentation, see this link.

PythonJSON
dictobject
list, tuplearray
strstring
int, float, intnumber
Truetrue
Falsefalse
Nonenull

Writing Python object to a JSON file

The method used to write a JSON file is dump(). This method is very similar to the method dumps(). The only difference is that while dumps() returns a string, dump() writes to a file. 

Here is a simple demonstration. This will open the file in writing mode and write the data in JSON format. Save this Python script in a file and run it.

import json

# Tuple is encoded to JSON array.
languages = ("English", "French")
# Dictionary is encoded to JSON object.
country = {
    "name": "Canada",
    "population": 37742154,
    "languages": languages,
    "president": None,
}

with open('countries_exported.json', 'w') as f:
    json.dump(country, f)
Link to GitHub

When this code is executed using Python, countries_exported.json file is created (or overwritten) and the contents are the JSON.

However, you will notice that the entire JSON is in one line. To make it more readable, we can pass one more parameter to the dump() function as follows:

json.dump(country, f, indent=4)

This time when you run the code, it will be nicely formatted with indentation of 4 spaces:

{
    "languages": [
        "English", 
        "French"
    ], 
    "president": null, 
    "name": "Canada", 
    "population": 37742154
}
Link to GitHub

Note that this indent parameter is also available for JSON dumps() method. The only difference between the signatures of JSON dump() and JSON dumps() is that dump() needs a file object.

Converting custom Python objects to JSON objects

Let’s examine the signature of dump() method:

dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True,allow_nan=True, cls=None, indent=None, separators=None,default=None, sort_keys=False, **kw)

Let’s focus on the parameter cls.

If no Class is supplied while calling the dump method, both the dump() and dumps() methods default to the JSONEncoder class. This class supports the standard Python types: dict, list, tuple, str, int, float, True, False, and None.

If we try to call the json.loads() method on any other type, this method will raise a TypeError with a message: Object of type <your_type> is not JSON serializable.

Save the following code as a Python script and run it:

import json

class Country:
    def __init__(self, name, population, languages):
        self.name = name    
        self.population = population
        self.languages = languages

    
canada = Country("Canada", 37742154, ["English", "French"])

print(json.dumps(canada))
# OUTPUT:   TypeError: Object of type Country is not JSON serializable
Link to GitHub

To convert the objects to JSON, we need to write a new class that extends JSONEncoder. In this class, the method default() should be implemented. This method will have the custom code to return the JSON.

Here is the example Encoder for the Country class. This class will help converting a Python object to a JSON object:

import json 
 
class CountryEncoder(json.JSONEncoder):
    def default(self, o): 
        if isinstance(o, Country):
           # JSON object would be a dictionary.
            return {
                    "name" : o.name,
                    "population": o.population,
                    "languages": o.languages
                } 
        else:
            # Base class will raise the TypeError.
            return super().default(o)
Link to GitHub

This code is simply returning a dictionary, after confirming that the supplied object is an instance of Country class, or calling the parent to handle the rest of the cases.

This class can now be supplied to the json.dump() as well as json.dumps() methods.

print(json.dumps(canada, cls=CountryEncoder))
# OUTPUT:  {“name": "Canada", "population": 37742154, "languages": ["English", "French"]}
Link to GitHub

Creating Python class objects from JSON objects

So far, we have discussed how json.load() and json.loads() methods can create a dictionary, list, and more. What if we want to read a JSON object and create a custom class object?

In this section, we will create a custom JSON Decoder that will help us create custom objects. This custom decoder will allow us to use the json.load() and json.loads() methods, which will return a custom class object.

We will work with the same Country class that we used in the previous section. Using a custom encoder, we were able to write code like this:

# Create an object of class Country
canada = Country("Canada", 37742154, ["English", "French"])
# Use json.dump() to create a JSON file in writing mode
with open('canada.json','w') as f:
    json.dump(canada,f, cls=CountryEncoder)
Link to GitHub

If we try to parse this JSON file using the json.load() method, we will get a dictionary:

with open('canada.json','r') as f:
    country_object = json.load(f)
# OUTPUT:  <type ‘dict'>
Link to GitHub

To get an instance of the Country class instead of a dictionary, we need to create a custom decoder. This decoder class will extend JSONDecoder. In this class, we will be writing a method that will be object_hook. In this method, we will create the object of Country class by reading the values from the dictionary.

Apart from writing this method, we would also need to call the __init__ method of the base class and set the value of the parameter object_hook to this method name. For simplicity, we can use the same name.

import json
 
class CountryDecoder(json.JSONDecoder):
    def __init__(self, object_hook=None, *args, **kwargs):
        super().__init__(object_hook=self.object_hook, *args, **kwargs)

    def object_hook(self, o):
        decoded_country =  Country(
            o.get('name'), 
            o.get('population'), 
            o.get('languages'),
        )
        return decoded_country
Link to GitHub

Note that we are using the .get() method to read dictionary keys. This will ensure that no errors are raised if a key is missing from the dictionary.

Finally, we can call the json.load() method and set the cls parameter to CountryDecoder class.

with open('canada.json','r') as f:
    country_object = json.load(f, cls=CountryDecoder)

print(type(country_object))
# OUTPUT:  <class ‘Country'>
Link to GitHub

That’s it! We now have a custom object created directly from JSON.

Loading vs dumping

The Python JSON module has four key functions: read(), reads(), load(), and loads(). It often becomes confusing to remember these functions. The most important thing to remember is that the letter ‘S’ stands for String. Also, read the letter ‘s’ separately in the functions loads() and dumps(), that is,  read loads as load-s and read dumps() as dump-s.

Here is a quick table to help you remember these functions:

FileString
Readload()loads()
Writedump()dumps()

Conclusion

In this tutorial, we explored reading and writing JSON data using Python. Knowing how to work with JSON data is essential, especially when working with websites. JSON is used to transfer and store data everywhere, including APIs, web scrapers, and modern databases like PostgreSQL. You can click here to find the complete code used in this article for your convenience.

Understanding JSON is crucial if you are working on a web scraping project that involves dynamic websites. Head over to our blog post for a practical example of JSON being useful for pages with infinite scroll. And if you're searching for advanced web scraping solutions, check our Web Scraper API, designed to gather public data at scale from most websites hassle-free.

About the author

Monika Maslauskaite

Monika Maslauskaite

Former Content Manager

Monika Maslauskaite is a former Content Manager at Oxylabs. A combination of tech-world and content creation is the thing she is super passionate about in her professional path. While free of work, you’ll find her watching mystery, psychological (basically, all kinds of mind-blowing) movies, dancing, or just making up choreographies in her head.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested

IN THIS ARTICLE:


  • What is JSON?


  • JSON in Python


  • Converting JSON string to Python object


  • Converting JSON file to Python object


  • Converting Python object to JSON string


  • Writing Python object to a JSON file


  • Converting custom Python objects to JSON objects


  • Creating Python class objects from JSON objects


  • Loading vs dumping


  • Conclusion

Need a high-quality scraping solution?

Look no further, as Oxylabs has the right tools to help you collect public data from the most complex targets.

Scale up your business with Oxylabs®