Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting a JSON from a stacked pandas dataframe

I have the sample dataframe below:

d = {'key': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', 'crow', 'crow', 'crow', 'crow'], 
     'count': [12, 3, 5, 5, 3, 1, 4, 1, 7, 3, 8, 2],
     'text': ["hello", "i", "am", "a", "piece", "of", "text", "have", "a", "nice", "day", "friends"],
}

}
df = pd.DataFrame(data=d)
df   

output:

    key count   text
0   foo    12   hello
1   foo     3   i
2   foo     5   am
3   foo     5   a
4   bar     3   piece
5   bar     1   of
6   bar     4   text
7   bar     1   have
8   crow    7   a
9   crow    3   nice
10  crow    8   day
11  crow    2   friends

I stacked the dataframe with: df.set_index("key").stack()

To get:

key        
foo   count         12
      text       hello
      count          3
      text           i
      count          5
      text          am
      count          5
      text           a
bar   count          3
      text       piece
      count          1
      text          of
      count          4
      text        text
      count          1
      text        have
crow  count          7
      text           a
      count          3
      text        nice
      count          8
      text         day
      count          2
      text     friends
dtype: object

I am now trying to output the stacked df as a JSON file, but when I use to_json(), I get the error:

ValueError: Series index must be unique for orient='index'

The expect output would text and count grouped by the key:

[
  {
    "key": "19",
    "values": [
        {
            text: 'hello',
            count: 12
        },
        {
            content: 'i',
            count: 3
        },
        {
            content: 'am',
            count: 5
        },
        ...
    ]
]
like image 998
mehsheenman Avatar asked Oct 14 '22 22:10

mehsheenman


1 Answers

As mentioned in the comment, your expected output is not a valid JSON string. You need "some_key":[...] at the same level with "key":"bar".

For example groupby:

json_str = json.dumps([ {'key':k, 'values':d.to_dict('records')}
                       for k,d in df.drop('key',axis=1).groupby(df['key'])
                      ], indent=2)

Output:

[
  {
    "key": "bar",
    "values": [
      {
        "count": 3,
        "text": "piece"
      },
      {
        "count": 1,
        "text": "of"
      },
      {
        "count": 4,
        "text": "text"
      },
      {
        "count": 1,
        "text": "have"
      }
    ]
  },
  {
    "key": "crow",
    "values": [
      {
        "count": 7,
        "text": "a"
      },
      {
        "count": 3,
        "text": "nice"
      },
      {
        "count": 8,
        "text": "day"
      },
      {
        "count": 2,
        "text": "friends"
      }
    ]
  },
  {
    "key": "foo",
    "values": [
      {
        "count": 12,
        "text": "hello"
      },
      {
        "count": 3,
        "text": "i"
      },
      {
        "count": 5,
        "text": "am"
      },
      {
        "count": 5,
        "text": "a"
      }
    ]
  }
]
like image 121
Quang Hoang Avatar answered Oct 18 '22 14:10

Quang Hoang