I have a function that outputs a grid of points as x and y numpy arrays for interpolation, but before I interpolate, I want to use Geopandas to perform an intersection with my research boundary (otherwise half of my interpolation points fall in the ocean).
I'm generating points like this:
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Point
x = np.linspace(0,100,100)
y = np.linspace(0,100,100)
x, y = np.meshgrid(x, y)
x, y = x.flatten(), y.flatten()
f, ax = plt.subplots()
plt.scatter(x, y)
plt.axis('equal')
plt.show()
Is there an efficient way to convert these numpy arrays to shapely.Point([x, y])
so they can be placed in a geopandas geodataframe?
This is my current approach:
interp_points = []
index = 0
y_list = yi.tolist()
for x in xi.tolist():
interp_points.append(Point(x,y_list[index]))
index += 1
But it seems like converting to lists and then iterating is likely not a good approach for performance, and I have approximately 160,000 points.
NumPy uses much less memory to store dataThe NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.
Appending to numpy arrays is very inefficient. This is because the interpreter needs to find and assign memory for the entire array at every single step. Depending on the application, there are much better strategies. If you know the length in advance, it is best to pre-allocate the array using a function like np.
Notice that here we're using the Python NumPy, imported using the import numpy statement. By running the above code, Cython took just 0.001 seconds to complete. For Python, the code took 0.003 seconds. Cython is nearly 3x faster than Python in this case.
There is no built-in way to do this with shapely
, so you need to iterate through the values yourself. For doing that, this should be a rather efficient way:
In [4]: from geopandas import GeoSeries
In [5]: s = GeoSeries(map(Point, zip(x, y)))
In [6]: s.head()
Out[6]:
0 POINT (0 0)
1 POINT (1.01010101010101 0)
2 POINT (2.02020202020202 0)
3 POINT (3.03030303030303 0)
4 POINT (4.040404040404041 0)
dtype: object
In [6]: %timeit GeoSeries(map(Point, zip(x, y)))
114 ms ± 8.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
(or slight alternative GeoSeries(list(zip(x, y))).map(Point)
)
See here for some example doing this: http://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html
There is some (stalled) work to include this in geopandas directly: https://github.com/geopandas/geopandas/pull/75
I think this is a good way:
import numpy as np
from shapely import geometry
points_np_array = np.random.rand(50,2)
polygon_1 = geometry.Polygon(np.squeeze(points_np_array))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With