I'm working on moving some spatial searching capabilities from Postgres with PostGIS to SQL Server and I'm seeing some pretty terrible performance, even with indexes.
My data is around a million points, and I want to find out which of those points are within given shapes, so the query looks something like this:
DECLARE @Shape GEOMETRY = ...
SELECT * FROM PointsTable WHERE Point.STWithin(@Shape) = 1
If I select a fairly small shape, I can sometimes get sub-second times, but if my shape is fairly large (which they sometimes are), I can get times over 5 minutes. If I run the same searches in Postgres, they're always under a second (in fact, almost all are under 200 ms).
I've tried several different grid sizes on my indexes (all high, all medium, all low), different cells per object (16, 64, 256), and no matter what I do the times stay fairly constant. I'd like to try more combinations but I don't even know what direction to go. More cells per object? Less? Some strange combination of grid sizes?
I've looked at my query plans and they're always using the index, it's just not helping at all. I've even tried without the index, and it's not much worse.
Is there any advice anyone can give on this? Everything I can find suggests "we can't give you any advice on indexes, just try everything and maybe one will work", but with it taking 10 minutes to create an index, doing this blindly is a massive waste of time.
EDIT: I also posted this on a Microsoft forum. Here's some information they asked for on there:
The best working index I could get was this one:
CREATE SPATIAL INDEX MapTesting_Location_Medium_Medium_Medium_Medium_16_NDX
ON MapTesting (Location)
USING GEOMETRY_GRID
WITH (
BOUNDING_BOX = ( -- The extent of our data, data is clustered in cities, but this is about as small as the index can be without missing thousands of points
XMIN = -12135832,
YMIN = 4433884,
XMAX = -11296439,
YMAX = 5443645),
GRIDS = (
LEVEL_1 = MEDIUM,
LEVEL_2 = MEDIUM,
LEVEL_3 = MEDIUM,
LEVEL_4 = MEDIUM),
CELLS_PER_OBJECT = 256 -- This was set to 16 but it was much slower
)
I had some issues getting the index used, but this is different.
For these tests I ran a test search (the one listed in my original post) with a WITH(INDEX(...)) clause for each of my indexes (testing various settings for grid size and cells per object), and one without any hint. I also ran sp_help_spatial_geometry_index using each index and the same search shape. The index listed above ran fastest and also was listed as most efficient in sp_help_spatial_geometry_index.
When running the search I get these statistics:
(1 row(s) affected)
Table 'MapTesting'. Scan count 0, logical reads 361142, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'extended_index_592590491_384009'. Scan count 1827, logical reads 8041, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 6735 ms, elapsed time = 13499 ms.
I also tried using random points as data (since I can't give out our real data), but it turns out that this search is really fast with random data. This lead us to believe that our problem is how the grid system works with our data.
Our data is addresses across the entire state, so there are a few very high density regions, but mostly sparse data. I think the problem is that no setting for the grid sizes works well for both. With grids set to HIGH
, the index returns too many cells in low-density areas, and with grids set to LOW
, the grids are useless in high density areas (at MEDIUM
, it's not as bad, but still not good at either).
I am able to get the index used, it's just not helping. Every test was run with "show actual execution plan" turned on, and it always shows the index.
I've just spent the day on a similar problem. In particular, we are doing a point-in-polygon type of query, where there was a relatively small set of polygons, but each polygon was large and complex.
Solution turned out to be as follows, for the spatial index on the polygon table:
This made a huge difference. It was 10 times faster than a spatial index in the default configuration, and 60 times faster than no index at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With