avatar

Monika Maslauskaite

Aug 30, 2021 12 min read

JSON is a common standard used by websites and APIs, and even natively supported by modern databases such as PostgreSQL. In this article, we’ll present a tutorial on how to handle JSON data with Python. Now, let’s start with the definition of JSON. 

What is JSON?

JSON, or JavaScript Object Notation, is a format that uses text to store data objects. In other words, it is data structures for representing objects as text. Even though it’s derived from JavaScript, it has become a de facto standard of transferring objects. 

This format is supported by most popular programming languages, including Python. Most commonly, JSON is used to transfer data objects by APIs. Here is an example of a JSON string:

{
   "name": "United States",
   "population": 331002651,
   "capital": "Washington D.C.",
   "languages": [
      "English",
      "Spanish"
   ]
}

In this example, JSON data looks like a Python dictionary. Just like dictionaries, JSON contains data in key-value pairs. However, the JSON data can also be a string, a number, a boolean, or a list.

Before JSON became popular, XML had been the common choice to represent data objects in a text format. Here is an example of the same information in the XML format:

<?xml version="1.0" encoding="UTF-8"?>
<country>
   <name>United States</name>
   <population>331002651</population>
   <capital>Washington D.C.</capital>
   <languages>
       <language>English</language>
       <language>Spanish</language>
   </languages>
</country>

As evident here, JSON is lightweight. This is one of the primary reasons why JSON is so popular. If you want to read more about the JSON standard, head over to the official JSON website

JSON in Python

Python supports JSON data natively. The Python json module is part of the Standard Library. The json module can handle the conversion of JSON data from JSON format to the equivalent Python objects such as dictionary and list. The JSON package can also convert Python objects into the JSON format.

The json module provides the functionality to write custom encoders and decoders, and there is no separate installation needed. You can find the official documentation for the Python JSON module here.

In the remainder of this tutorial, we will explore this package. We’re going to convert JSON to dictionary and list and the other way round. We’ll also explore how to handle custom classes.

Converting JSON string to Python object

JSON data is frequently stored in strings. This is a common scenario when working with APIs. The JSON data would be stored in string variables before it can be parsed. As a result, the most common task related to JSON is to parse the JSON string into the Python dictionary. The JSON module can take care of this task easily. 

The first step would be importing the Python json module. This module contains two important functions – loads and load.

Note that the first method looks like a plural form, but it is not. The letter ‘S’ stands for ‘string’. 

The helpful method to parse JSON data from strings is loads. Note that it is read as ‘load-s’. The ‘s’ stands for ‘string’ here. The other method load is used when the data is in bytes. This is covered at length in a later section.

Let’s start with a simple example. The instance of JSON data is as follows:

{
   "name": "United States",
   "population": 331002651,
}

JSON data can be stored as JSON string before it is parsed. Even though we can use Python’s triple quotes convention to store multi-line strings, we can remove the line breaks for readability.

# JSON string
country = '{"name": "United States", "population": 331002651}'
print(type(country))

The output of this snippet will confirm that this is indeed a JSON string:

<class 'str'>

We can call the json.loads() method and provide this string as a parameter.

import json

country = '{"name": "United States", "population": 331002651}'
country_dict = json.loads(country)

print(type(country))
print(type(country_dict))

The output of this snippet will confirm that the JSON data, which was a string, is now a Python dictionary.

<class 'str'>
<class 'dict'>

This dictionary can be accessed as usual:

print(country_dict['name'])
# OUTPUT:   United States

It is important to note here that the json.loads() method will not always return a dictionary. The data type that is returned will depend on the input string. For example, this JSON string will return a list, not a dictionary.

countries = '["United States", "Canada"]'
counties_list= json.loads(countries)

print(type(counties_list))
# OUTPUT:  <class 'list'>

Similarly, if the JSON string contains true, it will be converted to Python equivalent boolean value, which is True.

import json
 
bool_string = 'true'
bool_type = json.loads(bool_string)
print(bool_type)
# OUTPUT:  True

The following table shows JSON objects and the Python data types after conversion. For more details, see Python docs.

JSONPython
objectdict
arraylist
stringstr
number (integer)int
number (real)float
trueTrue
falseFalse
nullNone

Now, let’s move on to the next topic on parsing a JSON object to a Python object.

Converting JSON file to Python object

Reading JSON files to parse JSON data into Python data is very similar to how we parse the JSON data stored in strings. Apart from JSON, Python’s native open() function will also be required.

Instead of the JSON loads method, which reads JSON strings, the method used to read JSON data in files is load()

The load() method takes up a file object and returns the JSON data parsed into a Python object.

To get the file object from a file path, Python’s open() function can be used.

Save the following JSON data as a new file and name it united_states.json:

{
   "name": "United States",
   "population": 331002651,
   "capital": "Washington D.C.",
   "languages": [
      "English",
      "Spanish"
   ]
}

Enter this Python script in a new file:

import json

with open('united_states.json') as f:
  data = json.load(f)

print(type(data))

Running this Python file prints the following:

<class 'dict'>

In this example, the open function returns a file handle, which is supplied to the load method.

This variable data contains the JSON as a Python dictionary. This means that the dictionary keys can be checked as follows:

print(data.keys())
# OUTPUT:  dict_keys(['name', 'population', 'capital', 'languages'])

Using this information, the value of name can be printed as follows:

data['name']
# OUTPUT:  United States

In the previous two sections, we examined how JSON can be converted to Python objects. Now, it’s time to explore how to convert Python objects to JSON.

Converting Python object to JSON string

Converting Python objects to JSON objects is also known as serialization or JSON encoding. It can be achieved by using the function dumps(). It is read as dump-s and the letter S stands for string.

Here is a simple example.  Save this code in a new file as a Python script:

import json

languages = ["English","French"]
country = {
    "name": "Canada",
    "population": 37742154,
    "languages": languages,
    "president": None,
}

country_string = json.dumps(country)
print(country_string)

When this file is run with Python, the following output is printed:

{"name": "Canada", "population": 37742154, "languages": ["English", "French"],
 "president": null}

The Python object is now a JSON object. This simple example demonstrates how easy it is to parse a Python object to a JSON object. Note that the Python object was a dictionary. That’s the reason it was converted into a JSON object type. Lists can be converted to JSON as well. Here is the Python script and its output:

import json

languages = ["English", "French"]

languages_string = json.dumps(languages)
print(languages_string)
# OUTPUT:   ["English", "French"]

It’s not just limited to a dictionary and a list. string, int, float, bool and even None value can be converted to JSON. 

Refer to the conversion table below for details. As you can see, only the dictionary is converted to json object type. For the official documentation, see this link.

PythonJSON
dictobject
list, tuplearray
strstring
int, float, intnumber
Truetrue
Falsefalse
Nonenull

Writing Python object to a JSON file

The method used to write a JSON file is dump(). This method is very similar to the method dumps(). The only difference is that while dumps() returns a string, dump() writes to a file. 

Here is a simple demonstration. This will open the file in writing mode and write the data in JSON format. Save this Python script in a file and run it.

import json

# Tuple is encoded to JSON array.
languages = ("English", "French")
# Dictionary is encoded to JSON object.
country = {
    "name": "Canada",
    "population": 37742154,
    "languages": languages,
    "president": None,
}

with open('countries_exported.json', 'w') as f:
    json.dump(country, f)

When this code is executed using Python, countries_exported.json file is created (or overwritten) and the contents are the JSON.

However, you will notice that the entire JSON is in one line. To make it more readable, we can pass one more parameter to the dump() function as follows:

json.dump(country, f, indent=4)

This time when you run the code, it will be nicely formatted with indentation of 4 spaces:

{
    "languages": [
        "English", 
        "French"
    ], 
    "president": null, 
    "name": "Canada", 
    "population": 37742154
}

Note that this indent parameter is also available for JSON dumps() method. The only difference between the signatures of JSON dump() and JSON dumps() is that dump() needs a file object.

Converting custom Python objects to JSON objects

Let’s examine the signature of dump() method:

dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True,allow_nan=True, cls=None, indent=None, separators=None,default=None, sort_keys=False, **kw)

Let’s focus on the parameter cls.

If no Class is supplied while calling the dump method, both the dump() and dumps() methods default to the JSONEncoder class. This class supports the standard Python types: dict, list, tuple, str, int, float, True, False, and None.

If we try to call the json.loads() method on any other type, this method will raise a TypeError with a message: Object of type <your_type> is not JSON serializable.

Save the following code as a Python script and run it:

import json

class Country:
    def __init__(self, name, population, languages):
        self.name = name    
        self.population = population
        self.languages = languages

    
canada = Country("Canada", 37742154, ["English", "French"])

print(json.dumps(canada))
# OUTPUT:   TypeError: Object of type Country is not JSON serializable

To convert the objects to JSON, we need to write a new class that extends JSONEncoder. In this class, the method default() should be implemented. This method will have the custom code to return the JSON.

Here is the example Encoder for the Country class. This class will help converting a Python object to a JSON object:

import json 
 
class CountryEncoder(json.JSONEncoder):
    def default(self, o): 
        if isinstance(o, Country):
           # JSON object would be a dictionary.
		return {
                "name" : o.name,
                "population": o.population,
                "languages": o.languages
            } 
        else:
            # Base class will raise the TypeError.
            return super().default(o)

This code is simply returning a dictionary, after confirming that the supplied object is an instance of Country class, or calling the parent to handle the rest of the cases.

This class can now be supplied to the json.dump() as well as json.dumps() methods.

print(json.dumps(canada, cls=CountryEncoder))
# OUTPUT:  {“name": "Canada", "population": 37742154, "languages": ["English", "French"]}

Creating Python class objects from JSON objects

So far, we have discussed how json.load() and json.loads() methods can create a dictionary, list, and more. What if we want to read a JSON object and create a custom class object?

In this section, we will create a custom JSON Decoder that will help us create custom objects. This custom decoder will allow us to use the json.load() and json.loads() methods, which will return a custom class object.

We will work with the same Country class that we used in the previous section. Using a custom encoder, we were able to write code like this:

# Create an object of class Country
canada = Country("Canada", 37742154, ["English", "French"])
# Use json.dump() to create a JSON file in writing mode
with open('canada.json','w') as f:
    json.dump(canada,f, cls=CountryEncoder)

If we try to parse this JSON file using the json.load() method, we will get a dictionary:

with open('canada.json','r') as f:
    country_object = json.load(f)
# OUTPUT:  <type ‘dict'>

To get an instance of the Country class instead of a dictionary, we need to create a custom decoder. This decoder class will extend JSONDecoder. In this class, we will be writing a method that will be object_hook. In this method, we will create the object of Country class by reading the values from the dictionary.

Apart from writing this method, we would also need to call the __init__ method of the base class and set the value of the parameter object_hook to this method name. For simplicity, we can use the same name.

import json
 
class CountryDecoder(json.JSONDecoder):
    def __init__(self, object_hook=None, *args, **kwargs):
        super().__init__(object_hook=self.object_hook, *args, **kwargs)

    def object_hook(self, o):
        decoded_country =  Country(
            o.get('name'), 
            o.get('population'), 
            o.get('languages'),
        )
        return decoded_country

Note that we are using the .get() method to read dictionary keys. This will ensure that no errors are raised if a key is missing from the dictionary.

Finally, we can call the json.load() method and set the cls parameter to CountryDecoder class.

with open('canada.json','r') as f:
    country_object = json.load(f, cls=CountryDecoder)

print(type(country_object))
# OUTPUT:  <class ‘Country'>

That’s it! We now have a custom object created directly from JSON.

Loading vs dumping

The Python JSON module has four key functions: read(), reads(), load(), and loads(). It often becomes confusing to remember these functions. The most important thing to remember is that the letter ‘S’ stands for String. Also, read the letter ‘s’ separately in the functions loads() and dumps(), that is,  read loads as load-s and read dumps() as dump-s.

Here is a quick table to help you remember these functions:

FileString
Readload()loads()
Writedump()dumps()

Conclusion

In this tutorial, we explored reading and writing JSON data using Python. Knowing how to work with JSON data is essential, especially when working with websites. JSON is used to transfer and store data everywhere, including APIs, web scrapers, and modern databases like PostgreSQL. 

Understanding JSON is crucial if you are working on a web scraping project that involves dynamic websites. Head over to our blog post for a practical example of JSON being useful for pages with infinite scroll.

avatar

About Monika Maslauskaite

Monika Maslauskaite is a Content Manager at Oxylabs. A combination of tech-world and content creation is the thing she is super passionate about in her professional path. While free of work, you’ll find her watching mystery, psychological (basically, all kinds of mind-blowing) movies, dancing, or just making up choreographies in her head.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Most Common HTTP Headers

Most Common HTTP Headers

Sep 20, 2021

5 min read