I have a bunch of geographical data as below. I would like to group the data by bins of .2 degrees in longitude AND .2 degree in latitude.
While it is trivial to do for either latitude or longitude, what is the most appropriate of doing this for both variables?
|User_ID |Latitude |Longitude|Datetime |u |v |
|---------|----------|---------|-------------------|-----|-----|
|222583401|41.4020375|2.1478710|2014-07-06 20:49:20|0.3 | 0.2 |
|287280509|41.3671346|2.0793115|2013-01-30 09:25:47|0.2 | 0.7 |
|329757763|41.5453577|2.1175164|2012-09-25 08:40:59|0.5 | 0.8 |
|189757330|41.5844998|2.5621569|2013-10-01 11:55:20|0.4 | 0.4 |
|624921653|41.5931846|2.3030671|2013-07-09 20:12:20|1.2 | 1.4 |
|414673119|41.5550136|2.0965829|2014-02-24 20:15:30|2.3 | 0.6 |
|414673119|41.5550136|2.0975829|2014-02-24 20:16:30|4.3 | 0.7 |
|414673119|41.5550136|2.0985829|2014-02-24 20:17:30|0.6 | 0.9 |
So far what I have done is created 2 linear spaces:
lonbins = np.linspace(df.Longitude.min(), df.Longitude.max(), 10)
latbins = np.linspace(df.Latitude.min(), df.Latitude.max(), 10)
Then I can groupBy using:
groups = df.groupby(pd.cut(df.Longitude, lonbins))
I could then obviously iterate over the groups to create a second level. My goal being to do statistical analysis on each of the group and possibly display them on a map it does not look very handy.
bucket = {}
for name, group in groups:
print name bucket[name] = group.groupby(pd.cut(group.Latitude, latbins))
For example I would like to do a heatmap which would display the number of rows per latlon box, display distribution of speed in each of the latlon boxes, ...
How about this?
step = 0.2
to_bin = lambda x: np.floor(x / step) * step
df["latBin"] = to_bin(df.Latitude)
df["lonBin"] = to_bin(df.Longitude)
groups = df.groupby(["latBin", "lonBin"])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With