I have been importing my raw IIS log files into a SQL Server table using the Log Parser tool for several months now. This is done to enable the writing of SSRS reports based on this log data.
One of the reports that I am working on is supposed to determine the number of Visits from each unique IP address. A Visit is defined as an IP address hitting a page on the site and then making 4 more requests within an hour of each other. All of the 5 requests are within one Visit to the site. Later on that night the same IP address hits the site, except that now it's 3 hours later, so we count this new activity from the same IP as a new Visit. Here is a sample of the data:
IPAddress, RequestDateTime, UriStem
10.1.1.100, 2010-10-15 13:30:30, /
10.1.1.100, 2010-10-15 13:30:31, /style.css
10.1.1.100, 2010-10-15 13:30:31, /script.js
10.1.1.100, 2010-10-15 13:30:32, /funny.gif
10.1.1.100, 2010-10-15 13:30:33, /picture.jpg
10.1.1.101, 2010-10-15 13:40:50, /page2.html
10.1.1.101, 2010-10-15 13:40:51, /style.css
10.1.1.102, 2010-10-15 14:10:20, /page4.html
10.1.1.102, 2010-10-15 14:10:21, /style.css
10.1.1.100, 2010-10-15 16:55:10, /
10.1.1.100, 2010-10-15 16:55:11, /style.css
10.1.1.100, 2010-10-15 16:55:11, /script.js
10.1.1.100, 2010-10-15 16:55:12, /funny.gif
10.1.1.100, 2010-10-15 16:55:13, /picture.jpg
By looking at the data above I can easily discern that the 10.1.1.100
IP
address has visited the site twice and had 5 hits on each visit. However, I am
at a loss as to how to express that in SQL code. Is there an easy way to group
and count these date ranges by IP address?
I understand that this information can be captured by using tools such as AWStats, but I do not have the luxury of being able to install Perl on the systems we use.
Give the code below a trial run. The code groups and numbers the visits from each IP address. Then it looks to see how many "uristem" hits compared with the "threshold" value. I tested the code on a table named "Foo" and you need to check your table and column names prior to running the test.
DECLARE @threshold INT;
SET @threshold = 4; --this number should not include the initial visit
DECLARE @lookbackdays int;
SET @lookbackdays = 300;
;WITH postCTE as
(
SELECT
ipaddress,
uristem,
requestdatetime,
RowNumber = ROW_NUMBER() OVER (ORDER BY ipaddress,requestdatetime ASC)
FROM
Foo --put your table name here
WHERE
requestdatetime > GETDATE() - @lookbackdays
)
--select * from postCTE
SELECT
p1.ipaddress AS [ipaddress],
p2.RowNumber - p1.RowNumber +1 AS [Requests],
p1.requestdatetime AS [DateStart]
FROM
postCTE p1
INNER JOIN
postCTE p2
ON p1.ipaddress = p2.ipaddress
AND p1.Rownumber = p2.RowNumber - (@threshold )
WHERE
DATEDIFF(minute,p1.requestdatetime,p2.requestdatetime) <= 60
The output of my test on SQL 2008 is
ipaddress Requests DateStart
10.1.1.100 5 2010-10-15 13:30:30.000
10.1.1.100 5 2010-10-15 16:55:10.000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With