Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe to JSONL (JSON Lines) conversion

I need to convert pandas data frame to JSONL format. I couldn't find a good package to do it and tried to implement myself, but it looks a bit ugly and not efficient.

For example, given a pandas df:

        label      pattern
  0      DRUG      aspirin
  1      DRUG    trazodone
  2      DRUG   citalopram

I need to convert to txt file of the form:

{"label":"DRUG","pattern":[{"lower":"aspirin"}]}
{"label":"DRUG","pattern":[{"lower":"trazodone"}]}
{"label":"DRUG","pattern":[{"lower":"citalopram"}]}

I tried with to_dict('records'), but I'm missing [ ] and nested 'lower' key.

df.to_dict('record')

creates:

[{'label': 'DRUG', 'pattern': 'aspirin'},
 {'label': 'DRUG', 'pattern': 'trazodone'},
 {'label': 'DRUG', 'pattern': 'citalopram'}]

I thought about converting the 'pattern' columns and include nested 'lower'?

UPD

So far, I succeeded to convert 'pattern' into list:

df_new = pd.concat((df[['label']], df[['pattern']].apply(lambda x: x.tolist(), axis=1)), axis=1)
df_new.columns = ['label', 'pattern']
df_new.head()

The result:

    label   pattern
0   DRUG    [aspirin]
1   DRUG    [trazodone]
2   DRUG    [citalopram]

and then:

df_new.to_dict(orient='records')

[{'label': 'DRUG', 'pattern': ['aspirin']},
 {'label': 'DRUG', 'pattern': ['trazodone']},
 {'label': 'DRUG', 'pattern': ['citalopram']}]

UPD 2

Eventually, I managed to get what I want, but in the most non-pythonic way.

df_1 = pd.DataFrame(df[['pattern']].apply(lambda x: {'lower': x[0]}, axis=1))
df_1.columns = ['pattern']

df_fin = pd.concat((df[['label']], df_1[['pattern']].apply(lambda x: x.tolist(), axis=1)), axis=1)
df_fin.columns = ['label', 'pattern']
df_fin.to_json(orient='records')

 '{'label': 'DRUG', 'pattern': [{'lower': 'aspirin'}]}
  {'label': 'DRUG', 'pattern': [{'lower': 'trazodone'}]}
  {'label': 'DRUG', 'pattern': [{'lower': 'citalopram'}]}'

Any chance you can show a neat solution?

like image 913
Arnold Klein Avatar asked Aug 09 '18 20:08

Arnold Klein


People also ask

How do you convert a DataFrame to a JSON string in Python?

You can convert pandas DataFrame to JSON string by using DataFrame. to_json() method. This method takes a very important param orient which accepts values ' columns ', ' records ', ' index ', ' split ', ' table ', and ' values '.

How do I convert a DataFrame to JSON?

You can convert JSON to Pandas DataFrame by simply using read_json() . Just pass JSON string to the function. It takes multiple parameters, for our case I am using orient that specifies the format of JSON string. This function is also used to read JSON files into pandas DataFrame.

What is the difference between JSON and Jsonl?

JSONL is a text-based format using the . jsonl file extension that is basically the same as JSON format but implemented using newline characters to separate JSON values. It is also known as JSON Lines. Manifold can import and link JSONL files.

How do you convert a data frame to a string?

If you want to change the data type for all columns in the DataFrame to the string type, you can use df. applymap(str) or df. astype(str) methods.


1 Answers

In versions of Pandas > 0.19.0, DataFrame.to_json has a parameter, lines, that will write out JSONL format.

Given that, a more succinct version of your solution might look like this:

import pandas as pd

data = [{'label': 'DRUG', 'pattern': 'aspirin'},
        {'label': 'DRUG', 'pattern': 'trazodone'},
        {'label': 'DRUG', 'pattern': 'citalopram'}]
df = pd.DataFrame(data)

# Wrap pattern column in a dictionary
df["pattern"] = df.pattern.apply(lambda x: {"lower": x})

# Output in JSONL format
print(df.to_json(orient='records', lines=True))

Output:

{"label":"DRUG","pattern":{"lower":"aspirin"}}
{"label":"DRUG","pattern":{"lower":"trazodone"}}
{"label":"DRUG","pattern":{"lower":"citalopram"}}
like image 90
kmsquire Avatar answered Sep 18 '22 06:09

kmsquire