Our underwriting company just sent us a data extract of leads. There are 9 million rows. The rows consist of LeadID (guid), RawLeadXML (xml -- probably 3-4kb max), and a LeadStatusID (int).
I first tried to add an autonumber integer and make it the primary key of this table. Well, it added the field, but couldn't make it the primary key (There is insufficient memory available in the buffer pool.)
What I need to do is to take every record, 1 by 1, and get the XML, put it into an XmlDocument object in .Net, strip out all the fields I want (firstname, lastname, etc) and store that parsed information into another table.
I can't even get this statement to run: select * from Leads where id between 1 and 1000
If I just select 1000 records at a time (select top 1000 * from Leads), that works, but how would I then select the next 1000 records without some sort of reference point?
My machine has 4 processors (2.53Ghz) and 12 GB of RAM. It's not a server, but it's a beefy machine. I don't know what to try next, honestly.
EDIT: I left out that the original file was actually an MDF (and the associated LDF) file, so I just attached to them in SQL Server.
EDIT 2: I messed up and said that the RawLeadXML column was XML -- it isn't, it's just nvarchar(max). I honestly didn't know there WAS an xml datatype.
EDIT 3: I can't even issue a delete statement on this table: "delete from leads where leadid = '100a7927-5311-4f12-8fe3-95c079d32dd4' " blows up with:
Msg 802, Level 17, State 20, Line 2
There is insufficient memory available in the buffer pool.
I have no idea what to do next. How in the hell is this even a problem? There are thousands of databases in the world with more records than I've got.
EDIT 4: In case anyone cares, none of the below solutions worked. I'm sure this is a limitation of my machine, and definitely not a condemnation of the fine answers I received below. Currently, I am transferring the zipped DB (2.6 GB) to our servers in Rackspace, and then I will attempt to add an index on that hardware, hopefully without taking down our production servers. Once the index is added, I'm hopeful that I can zip up the db and bring it back down to my local machine, and then be able to actually do something with it.
EDIT 5: My machine was literally incapable of handling a table of this size. My machine has 12 GB RAM, 64 bit Windows 7 Professional, a quad core 2.53Ghz processor, SSD drive, etc. It's pretty beefy for a development machine. And it couldn't handle this.
So, I moved the DB to our server in Rackspace in London. 48 GB or memory in that one, and it was able to add the index I needed. Even after that, my machine here was unable to do anything all that useful with it, so I've written a .Net program that is running in London to put out the records 1000 at a time, parse them into another table, and then mark the original records as processed.
Once I do this, I'm going to have to leave the DB in London, because I doubt I could write any meaningful reports against this monstrosity locally. Which will make development fun.
Summary: I believe that there is no good way to process a dataset this large without using server class hardware with at least 48 GB of RAM (in my case).
Use the SQL Server BCP to export big tables data This table includes 100 million rows and it's size is about 7.5 GB. In our first testing, we will run the SQL Server BCP with default values in order to export 100 M rows.
Save this answer. Show activity on this post. DECLARE @Rows INT, @BatchSize INT; -- keep below 5000 to be safe SET @BatchSize = 2000; SET @Rows = @BatchSize; -- initialize just to enter the loop BEGIN TRY WHILE (@Rows = @BatchSize) BEGIN UPDATE TOP (@BatchSize) tab SET tab. Value = 'abc1' FROM TableName tab WHERE tab.
Another possibility is to create a time series in column oriented database like HBase or Cassandra. In this case you'd have one row per product and as many columns as hits. Last, if you are going to do it with the database, as @JosMac pointed, create partitions, avoid indexes as much as you can.
Error 802 does not mean out of memory in the classical sense of insufficient memory to do an allocation (that would trigger error 701). Error 802 actually indicates that the buffer pool cannot grow, which can occur due to a couple of reasons:
If you still haven't found the problem, read How to use the DBCC MEMORYSTATUS command to monitor memory usage on SQL Server 2005 (the article applies equally well to SQL Server 2008 and to 2008 R2) and follow the guidelines there to understand who/what is consuming your memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With