I have the sample dataframe below:
d = {'key': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', 'crow', 'crow', 'crow', 'crow'],
'count': [12, 3, 5, 5, 3, 1, 4, 1, 7, 3, 8, 2],
'text': ["hello", "i", "am", "a", "piece", "of", "text", "have", "a", "nice", "day", "friends"],
}
}
df = pd.DataFrame(data=d)
df
output:
key count text
0 foo 12 hello
1 foo 3 i
2 foo 5 am
3 foo 5 a
4 bar 3 piece
5 bar 1 of
6 bar 4 text
7 bar 1 have
8 crow 7 a
9 crow 3 nice
10 crow 8 day
11 crow 2 friends
I stacked the dataframe with:
df.set_index("key").stack()
To get:
key
foo count 12
text hello
count 3
text i
count 5
text am
count 5
text a
bar count 3
text piece
count 1
text of
count 4
text text
count 1
text have
crow count 7
text a
count 3
text nice
count 8
text day
count 2
text friends
dtype: object
I am now trying to output the stacked df as a JSON file, but when I use to_json()
, I get the error:
ValueError: Series index must be unique for orient='index'
The expect output would text
and count
grouped by the key
:
[
{
"key": "19",
"values": [
{
text: 'hello',
count: 12
},
{
content: 'i',
count: 3
},
{
content: 'am',
count: 5
},
...
]
]
As mentioned in the comment, your expected output is not a valid JSON string. You need "some_key":[...]
at the same level with "key":"bar"
.
For example groupby
:
json_str = json.dumps([ {'key':k, 'values':d.to_dict('records')}
for k,d in df.drop('key',axis=1).groupby(df['key'])
], indent=2)
Output:
[
{
"key": "bar",
"values": [
{
"count": 3,
"text": "piece"
},
{
"count": 1,
"text": "of"
},
{
"count": 4,
"text": "text"
},
{
"count": 1,
"text": "have"
}
]
},
{
"key": "crow",
"values": [
{
"count": 7,
"text": "a"
},
{
"count": 3,
"text": "nice"
},
{
"count": 8,
"text": "day"
},
{
"count": 2,
"text": "friends"
}
]
},
{
"key": "foo",
"values": [
{
"count": 12,
"text": "hello"
},
{
"count": 3,
"text": "i"
},
{
"count": 5,
"text": "am"
},
{
"count": 5,
"text": "a"
}
]
}
]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With