Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server Date Range Problem When Using Data From Web Server Logs

I have been importing my raw IIS log files into a SQL Server table using the Log Parser tool for several months now. This is done to enable the writing of SSRS reports based on this log data.

One of the reports that I am working on is supposed to determine the number of Visits from each unique IP address. A Visit is defined as an IP address hitting a page on the site and then making 4 more requests within an hour of each other. All of the 5 requests are within one Visit to the site. Later on that night the same IP address hits the site, except that now it's 3 hours later, so we count this new activity from the same IP as a new Visit. Here is a sample of the data:

IPAddress,   RequestDateTime,     UriStem
10.1.1.100,  2010-10-15 13:30:30, /
10.1.1.100,  2010-10-15 13:30:31, /style.css
10.1.1.100,  2010-10-15 13:30:31, /script.js
10.1.1.100,  2010-10-15 13:30:32, /funny.gif
10.1.1.100,  2010-10-15 13:30:33, /picture.jpg
10.1.1.101,  2010-10-15 13:40:50, /page2.html
10.1.1.101,  2010-10-15 13:40:51, /style.css
10.1.1.102,  2010-10-15 14:10:20, /page4.html
10.1.1.102,  2010-10-15 14:10:21, /style.css
10.1.1.100,  2010-10-15 16:55:10, /
10.1.1.100,  2010-10-15 16:55:11, /style.css
10.1.1.100,  2010-10-15 16:55:11, /script.js
10.1.1.100,  2010-10-15 16:55:12, /funny.gif
10.1.1.100,  2010-10-15 16:55:13, /picture.jpg

By looking at the data above I can easily discern that the 10.1.1.100 IP address has visited the site twice and had 5 hits on each visit. However, I am at a loss as to how to express that in SQL code. Is there an easy way to group and count these date ranges by IP address?

I understand that this information can be captured by using tools such as AWStats, but I do not have the luxury of being able to install Perl on the systems we use.

like image 739
λ Jonas Gorauskas Avatar asked Nov 18 '10 23:11

λ Jonas Gorauskas


1 Answers

Give the code below a trial run. The code groups and numbers the visits from each IP address. Then it looks to see how many "uristem" hits compared with the "threshold" value. I tested the code on a table named "Foo" and you need to check your table and column names prior to running the test.

DECLARE @threshold INT;  
SET @threshold = 4;  --this number should not include the initial visit
DECLARE @lookbackdays int; 
SET @lookbackdays = 300; 

;WITH postCTE as  
(  
SELECT   
    ipaddress,  
    uristem,  
    requestdatetime,  
    RowNumber = ROW_NUMBER() OVER (ORDER BY ipaddress,requestdatetime ASC)  
FROM  
    Foo  --put your table name here
WHERE  
    requestdatetime > GETDATE() - @lookbackdays 
)  
--select * from postCTE

SELECT   
    p1.ipaddress AS [ipaddress],   
    p2.RowNumber - p1.RowNumber +1 AS [Requests], 
    p1.requestdatetime AS [DateStart]
FROM  
    postCTE p1  
INNER JOIN  
    postCTE p2   
    ON p1.ipaddress = p2.ipaddress   
    AND p1.Rownumber = p2.RowNumber - (@threshold )  
WHERE  
    DATEDIFF(minute,p1.requestdatetime,p2.requestdatetime) <= 60 

The output of my test on SQL 2008 is

ipaddress   Requests    DateStart
10.1.1.100  5   2010-10-15 13:30:30.000
10.1.1.100  5   2010-10-15 16:55:10.000
like image 195
RC_Cleland Avatar answered Sep 21 '22 05:09

RC_Cleland