Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient way to convert numpy arrays to Shapely Points?

I have a function that outputs a grid of points as x and y numpy arrays for interpolation, but before I interpolate, I want to use Geopandas to perform an intersection with my research boundary (otherwise half of my interpolation points fall in the ocean).

I'm generating points like this:

import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Point

x = np.linspace(0,100,100)
y = np.linspace(0,100,100)
x, y = np.meshgrid(x, y)
x, y = x.flatten(), y.flatten()


f, ax = plt.subplots()

plt.scatter(x, y)
plt.axis('equal')
plt.show()

Is there an efficient way to convert these numpy arrays to shapely.Point([x, y]) so they can be placed in a geopandas geodataframe?

This is my current approach:

interp_points = []
index = 0
y_list = yi.tolist()
for x in xi.tolist():
    interp_points.append(Point(x,y_list[index]))
    index += 1

But it seems like converting to lists and then iterating is likely not a good approach for performance, and I have approximately 160,000 points.

like image 790
Ryan Avatar asked Jun 21 '18 15:06

Ryan


People also ask

Is NumPy array more memory efficient than list?

NumPy uses much less memory to store dataThe NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.

Is appending to NumPy array efficient?

Appending to numpy arrays is very inefficient. This is because the interpreter needs to find and assign memory for the entire array at every single step. Depending on the application, there are much better strategies. If you know the length in advance, it is best to pre-allocate the array using a function like np.

Is Cython faster than NumPy?

Notice that here we're using the Python NumPy, imported using the import numpy statement. By running the above code, Cython took just 0.001 seconds to complete. For Python, the code took 0.003 seconds. Cython is nearly 3x faster than Python in this case.


2 Answers

There is no built-in way to do this with shapely, so you need to iterate through the values yourself. For doing that, this should be a rather efficient way:

In [4]: from geopandas import GeoSeries

In [5]: s = GeoSeries(map(Point, zip(x, y)))

In [6]: s.head()
Out[6]: 
0                    POINT (0 0)
1     POINT (1.01010101010101 0)
2     POINT (2.02020202020202 0)
3     POINT (3.03030303030303 0)
4    POINT (4.040404040404041 0)
dtype: object

In [6]: %timeit GeoSeries(map(Point, zip(x, y)))
114 ms ± 8.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

(or slight alternative GeoSeries(list(zip(x, y))).map(Point))

See here for some example doing this: http://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html

There is some (stalled) work to include this in geopandas directly: https://github.com/geopandas/geopandas/pull/75

like image 120
joris Avatar answered Oct 20 '22 09:10

joris


I think this is a good way:

import numpy as np        
from shapely import geometry

points_np_array = np.random.rand(50,2)
polygon_1 = geometry.Polygon(np.squeeze(points_np_array))
like image 38
Manute Avatar answered Oct 20 '22 08:10

Manute