Parsing a JSON string which was loaded from a CSV using Pandas

People also ask

How read JSON string in pandas?

If you have a JSON in a string, you can read or load this into pandas DataFrame using read_json() function. By default, JSON string should be in Dict like format {column -> {index -> value}} . This is also called column orientation. Note that orient param is used to specify the JSON string format.

How do you parse a string of JSON response?

Example - Parsing JSON parse() to convert text into a JavaScript object: const obj = JSON. parse('{"name":"John", "age":30, "city":"New York"}'); Make sure the text is in JSON format, or else you will get a syntax error.

Which method is used to parse a string having JSON data in Python JSON loads () JSON parse () JSON read () None of the above?

Parse JSON - Convert from JSON to Python If you have a JSON string, you can parse it by using the json.loads() method. The result will be a Python dictionary.

What is the method used to parse a string containing JSON data so that you can work with the data in Python?

You can parse a JSON string using json. loads() method. The method returns a dictionary.

I think applying the json.load is a good idea, but from there you can simply directly convert it to dataframe columns instead of writing/loading it again:

stdf = df['stats'].apply(json.loads)
pd.DataFrame(stdf.tolist()) # or stdf.apply(pd.Series)

or alternatively in one step:

df.join(df['stats'].apply(json.loads).apply(pd.Series))

There is a slightly easier way, but ultimately you'll have to call json.loads There is a notion of a converter in pandas.read_csv

converters : dict. optional

Dict of functions for converting values in certain columns. Keys can either be integers or column labels

So first define your custom parser. In this case the below should work:

def CustomParser(data):
    import json
    j1 = json.loads(data)
    return j1

In your case you'll have something like:

df = pandas.read_csv(f1, converters={'stats':CustomParser},header=0)

We are telling read_csv to read the data in the standard way, but for the stats column use our custom parsers. This will make the stats column a dict

From here, we can use a little hack to directly append these columns in one step with the appropriate column names. This will only work for regular data (the json object needs to have 3 values or at least missing values need to be handled in our CustomParser)

df[sorted(df['stats'][0].keys())] = df['stats'].apply(pandas.Series)

On the Left Hand Side, we get the new column names from the keys of the element of the stats column. Each element in the stats column is a dictionary. So we are doing a bulk assign. On the Right Hand Side, we break up the 'stats' column using apply to make a data frame out of each key/value pair.

Paul's original answer was very nice but not correct in general, because there is no assurance that the ordering of columns is the same on the left-hand side and the right-hand side of the last line. (In fact, it does not seem to work on the test data in the question, instead erroneously switching the height and weight columns.)

We can fix this by ensuring that the list of dict keys on the LHS is sorted. This works because the apply on the RHS automatically sorts by the index, which in this case is the list of column names.

def CustomParser(data):
  import json
  j1 = json.loads(data)
  return j1

df = pandas.read_csv(f1, converters={'stats':CustomParser},header=0)
df[sorted(df['stats'][0].keys())] = df['stats'].apply(pandas.Series)

Option 1

If you dumped the column with json.dumps before you wrote it to csv, you can read it back in with:

import json
import pandas as pd

df = pd.read_csv('data/file.csv', converters={'json_column_name': json.loads})

Option 2

If you didn't then you might need to use this:

import json
import pandas as pd

df = pd.read_csv('data/file.csv', converters={'json_column_name': eval})

Option 3

For more complicated situations you can write a custom converter like this:

import json
import pandas as pd

def parse_column(data):
    try:
        return json.loads(data)
    except Exception as e:
        print(e)
        return None


df = pd.read_csv('data/file.csv', converters={'json_column_name': parse_column})

json_normalize function in pandas.io.json package helps to do this without using custom function.

(assuming you are loading the data from a file)

from pandas.io.json import json_normalize
df = pd.read_csv(file_path, header=None)
stats_df = json_normalize(data['stats'].apply(ujson.loads).tolist())
stats_df.set_index(df.index, inplace=True)
df.join(stats_df)
del df.drop(df.columns[2], inplace=True)

Related questions
                            
                                using Flask and Tornado together?
                            
                                How does the order of mixins affect the derived class?
                            
                                Fill cells with colors using openpyxl?
                            
                                Pandas DataFrame Add column to index without resetting
                            
                                How to I display why some tests where skipped while using py.test?
                            
                                Running an Excel macro via Python?
                            
                                Why isn't .ico file defined when setting window's icon?
                            
                                How to update the image of a Tkinter Label widget?
                            
                                How do I add a title and axis labels to Seaborn Heatmap?
                            
                                how to add a coroutine to a running asyncio loop?
                            
                                How can I check for unused import in many Python files?
                            
                                Suppressing scientific notation in pandas?
                            
                                How to make a custom activation function with only Python in Tensorflow?
                            
                                summing two columns in a pandas dataframe
                            
                                Select multiple columns by labels in pandas
                            
                                Vim autocomplete for Python
                            
                                Python calling method in class
                            
                                How to call an external program in python and retrieve the output and return code?
                            
                                How to find newest file with .MP3 extension in directory?
                            
                                Get first row of dataframe in Python Pandas based on criteria

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parsing a JSON string which was loaded from a CSV using Pandas

Tags:

python

pandas

People also ask

Recent Activity

Donate For Us