Flatten nested dictionaries in pandas using glom
Pandas is great! You can do pretty much eveything with it: from data cleaning to quick data viz. How about working with nested dictionary from a json file?
pandas.json_normalize can do most of the work for you (most of the time). However, json_normalize gets slow when you want to flatten a large json file. In addition, it flattens the entire dictionary when your goal might actually be to build your own dataframe by extracting selected keys and values.
How can you do that? How do we (i) import a nested diction from a json file into pandas and (ii) build a dataframe with a selected menu of information? The answer is: glom. See below a step by step guide.
First, get the data. Here is a simple nested dictionary from a sample json file.
Second, import the python libraries and import our data (i.e. nested dictionary) into a single column of a pandas dataframe.
Note1: in addition to
glom we also use
literal_eval to convert a string representation of a dictionary to a (pure) python dictionary.
Note2: the dictionary is stored as a string.
Third, extract selected keys and corresponding values from the nested dictionary using
glom and build your own dataframe quickly and efficiently. In this example, we extract the
id key and we save its value in a separate column with the same name in the pandas dataframe.
The basic use of glom requires two main arguments: (i) the data and (ii) the path to the keys whose values we want to extract and record in the dataframe. For more info and advance use, see glom tutorials in the docs;)
Finally, the output looks like the following:
And this is it! (: