Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transaction level, nolock/readpast and concurrency

We have a system that is concurrently inserted a large amount of data from multiple stations while also exposing a data querying interface. The schema looks something like this (sorry about the poor formatting):

[SyncTable]
  SyncID
  StationID
  MeasuringTime


[DataTypeTable]
  TypeID
  TypeName


[DataTable]
  SyncID
  TypeID
  DataColumns...

Data insertion is done in a "Synchronization" and goes like this (we only insert data into the system, we never update)

INSERT INTO SyncTable(StationID, MeasuringTime) VALUES (X,Y); SELECT @@IDENTITY

INSERT INTO DataTable(SyncID, TypeID, DataColumns) VALUES 
  (SyncIDJustInserted, InMemoryCachedTypeID, Data)
  ... lots (500) similar inserts into DataTable ...

And queries goes like this ( for a given station, measuringtime and datatype)

SELECT SyncID FROM SyncTable WHERE StationID = @StationID 
                               AND MeasuringTime = @MeasuringTime 
SELECT DataColumns FROM DataTable WHERE SyncID = @SyncIDJustSelected
                                  AND DataTypeID = @TypeID

My question is how can we combine the transaction level on the inserts and NOLOCK/READPAST hints on the queries so that:

  1. We maximize the concurrency in our system while favoring the inserts (we need to store a lot of data, something as high as 2000+ records a second)
  2. Queries only return data from "commited" synchronization (we don't want a result set with a half inserted synchronization or a synchronization with some skipped entries due to lock-skipping)
  3. We don't care if the "newest" data is included in the query, we care more for consistency and responsiveness then for "live" and up-to-date data

This may be very conflicting goals and may require a high transaction isolation level but I am interested in all tricks and optimizations to achieve high responsiveness on both inserts and selects. I'll be happy to elaborate if more details are needed to flush out more tweaks and tricks.

UPDATE: Just adding a bit more information for future replies. We are running SQL Server 2005 (2008 within six months probably) on a SAN network with 5+ TB of storage initially. I'm not sure what kind of RAID the SAn is set up to and precisely how many disks we have available.

like image 479
soren.enemaerke Avatar asked Nov 05 '22 22:11

soren.enemaerke


2 Answers

If you are running SQL 2005 and above look into implementing snapshot isolation. You will not be able to get consistent results with nolock.

Solving this on SQL 2000 is much harder.

like image 155
Sam Saffron Avatar answered Nov 20 '22 14:11

Sam Saffron


This is a great scenario for SQL Server 2005/2008 Enterprise's Partitioning feature. You can create a partition for each StationID, and each StationID's data can go into its own filegroup (if you want, may not be necessary depending on your load.)

This buys you some advantages with concurrency:

  • If you partition by stationid, then users can run select queries for stationid's that aren't currently loading, and they won't run into any concurrency issues at all
  • If you partition by stationid, then multiple stations can insert data simultaneously without concurrency issues (as long as they're on different filegroups)
  • If you partition by syncid range, then you can put the older data on slower storage.
  • If you partition by syncid range, AND if your ranges are small enough (meaning not a range with thousands of syncids) then you can do loads at the same time your users are querying without running into concurrency issues

The scenario you're describing has a lot in common with data warehouse nightly loads. Microsoft did a technical reference project called Project Real that you might find interesting. They published it as a standard, and you can read through the design docs and the implementation code in order to see how they pulled off really fast loads:

http://www.microsoft.com/technet/prodtechnol/sql/2005/projreal.mspx

Partitioning is even better in SQL Server 2008, especially around concurrency. It's still not a silver bullet - it requires manual design and maintenance by a skilled DBA. It's not a set-it-and-forget-it feature, and it does require Enterprise Edition, which costs more than Standard Edition. I love it, though - I've used it several times and it's solved specific problems for me.

like image 36
Brent Ozar Avatar answered Nov 20 '22 12:11

Brent Ozar