I've a massive geo json in this form:
{'features': [{'properties': {'MARKET': 'Albany',
'geometry': {'coordinates': [[[-74.264948, 42.419877, 0],
[-74.262041, 42.425856, 0],
[-74.261175, 42.427631, 0],
[-74.260384, 42.429253, 0]]],
'type': 'Polygon'}}},
{'properties': {'MARKET': 'Albany',
'geometry': {'coordinates': [[[-73.929627, 42.078788, 0],
[-73.929114, 42.081658, 0]]],
'type': 'Polygon'}}},
{'properties': {'MARKET': 'Albuquerque',
'geometry': {'coordinates': [[[-74.769198, 43.114089, 0],
[-74.76786, 43.114496, 0],
[-74.766474, 43.114656, 0]]],
'type': 'Polygon'}}}],
'type': 'FeatureCollection'}
After reading the json:
import json
with open('x.json') as f:
data = json.load(f)
I read the values into a list and then into a dataframe:
#to get a list of all markets
mkt=set([f['properties']['MARKET'] for f in data['features']])
#to create a list of market and associated lat long
markets=[(market,list(chain.from_iterable(f['geometry']['coordinates']))) for f in data['features'] for market in mkt if f['properties']['MARKET']==mkt]
df = pd.DataFrame(markets[0:], columns=['a','b'])
First few rows of df are:
a b
0 Albany [[-74.264948, 42.419877, 0], [-74.262041, 42.4...
1 Albany [[-73.929627, 42.078788, 0], [-73.929114, 42.0...
2 Albany [[-74.769198, 43.114089, 0], [-74.76786, 43.11...
Then to unnest the nested list in column b, I used pandas concat
:
df1 = pd.concat([df.iloc[:,0:1], df['b'].apply(pd.Series)], axis=1)
But this is creating 8070 columns with many NaNs. Is there a way to group all the latitudes and longitudes by the Market (column a)? A million rows by two column dataframe is desired.
Desired op is:
mkt lat long
Albany 42.419877 -74.264948
Albany 42.078788 -73.929627
..
Albuquerque 35.105361 -106.640342
Pls note that the zero in the list element ([-74.769198, 43.114089, 0]) needs to be ignored.
Something like this??
from pandas.io.json import json_normalize
df = json_normalize(geojson["features"])
coords = 'properties.geometry.coordinates'
df2 = (df[coords].apply(lambda r: [(i[0],i[1]) for i in r[0]])
.apply(pd.Series).stack()
.reset_index(level=1).rename(columns={0:coords,"level_1":"point"})
.join(df.drop(coords,1), how='left')).reset_index(level=0)
df2[['lat','long']] = df2[coords].apply(pd.Series)
df2
Outputs:
index point properties.geometry.coordinates properties.MARKET \
0 0 0 (-74.264948, 42.419877) Albany
1 0 1 (-74.262041, 42.425856) Albany
2 0 2 (-74.261175, 42.427631) Albany
3 0 3 (-74.260384, 42.429253) Albany
4 1 0 (-73.929627, 42.078788) Albany
5 1 1 (-73.929114, 42.081658) Albany
6 2 0 (-74.769198, 43.114089) Albuquerque
7 2 1 (-74.76786, 43.114496) Albuquerque
8 2 2 (-74.766474, 43.114656) Albuquerque
properties.geometry.type lat long
0 Polygon -74.264948 42.419877
1 Polygon -74.262041 42.425856
2 Polygon -74.261175 42.427631
3 Polygon -74.260384 42.429253
4 Polygon -73.929627 42.078788
5 Polygon -73.929114 42.081658
6 Polygon -74.769198 43.114089
7 Polygon -74.767860 43.114496
8 Polygon -74.766474 43.114656
If:
geojson = {'features': [{'properties': {'MARKET': 'Albany',
'geometry': {'coordinates': [[[-74.264948, 42.419877, 0],
[-74.262041, 42.425856, 0],
[-74.261175, 42.427631, 0],
[-74.260384, 42.429253, 0]]],
'type': 'Polygon'}}},
{'properties': {'MARKET': 'Albany',
'geometry': {'coordinates': [[[-73.929627, 42.078788, 0],
[-73.929114, 42.081658, 0]]],
'type': 'Polygon'}}},
{'properties': {'MARKET': 'Albuquerque',
'geometry': {'coordinates': [[[-74.769198, 43.114089, 0],
[-74.76786, 43.114496, 0],
[-74.766474, 43.114656, 0]]],
'type': 'Polygon'}}}],
'type': 'FeatureCollection'}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With