This is first time use stackoverflow to ask question. I have poor English,so if I affend you accidently in word, please don't mind.
I have a json file (access.json),format like:
[
{u'IP': u'aaaa1', u'Domain': u'bbbb1', u'Time': u'cccc1', ..... },
{u'IP': u'aaaa2', u'Domain': u'bbbb2', u'Time': u'cccc2', ..... },
{u'IP': u'aaaa3', u'Domain': u'bbbb3', u'Time': u'cccc3', ..... },
{u'IP': u'aaaa4', u'Domain': u'bbbb4', u'Time': u'cccc4', ..... },
{ ....... },
{ ....... }
]
When I use:
ipython
import pasdas as pd
data = pd.read_json('./access.json')
it return:
ValueError: Expected object or value
that is the result I want:
[out]
IP Domain Time ...
0 aaaa1 bbbb1 cccc1 ...
1 aaaa2 bbbb2 cccc2 ...
2 aaaa3 bbbb3 cccc3 ...
3 aaaa4 bbbb4 cccc4 ...
...and so on
How should I do to achieve this goal? Thank you for answer!
This isn't valid json which is why read_json
won't parse it.
{u'IP': u'aaaa1', u'Domain': u'bbbb1', u'Time': u'cccc1', ..... },
should be
{"IP": "aaaa1", "Domain": "bbbb1", "Time": "cccc1", ..... },
You could smash this (the entire file) with a regular expression to find these, for example:
In [11]: line
Out[11]: "{u'IP': u'aaaa1', u'Domain': u'bbbb1', u'Time': u'cccc1'},"
In [12]: re.sub("(?<=[\{ ,])u'|'(?=[:,\}])", '"', line)
Out[12]: '{"IP": "aaaa1", "Domain": "bbbb1", "Time": "cccc1"},'
Note: this will get tripped up by some strings, so use with caution.
A better "solution" would be to ensure you had valid json in the first place... It looks like this has come from python's str/unicode/repr rather than json.dumps
.
Note: json.dumps
produces valid json, so can be read by read_json
.
In [21]: repr({u'IP': u'aaa'})
Out[21]: "{u'IP': u'aaa'}"
In [22]: json.dumps({u'IP': u'aaa'})
Out[22]: '{"IP": "aaa"}'
If someone else created this "json", then complain! It's not json.
It is not a JSON format. It is a list of dictionaries. You can use ast.literal_eval()
to get the actual list from the file and pass it to the DataFrame
constructor:
from ast import literal_eval
import pandas as pd
with open('./access.log2.json') as f:
data = literal_eval(f.read())
df = pd.DataFrame(data)
print df
Output for the example data you've provided:
Domain IP Time
0 bbbb1 aaaa1 cccc1
1 bbbb2 aaaa2 cccc2
2 bbbb3 aaaa3 cccc3
3 bbbb4 aaaa4 cccc4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With