Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Help identifying forum spammers via a SQL query?

I would like to have a simple query that I can run against a database to return abnoralities in the time threshold that users post to our forum. If I have the following database structure:

ThreadId | UserId | PostAuthor | PostDate |
1          1000     Spammer      2010-11-14 02:52:50.093
2          1000     Spammer      2010-11-14 02:53:06.893
3          1000     Spammer      2010-11-14 02:53:22.130
4          1000     Spammer      2010-11-14 02:53:37.073
5          2000     RealUser     2010-11-14 02:53:52.383
6          1000     Spammer      2010-11-14 02:54:07.430 

I would like to set a threshold to say for instance, that if 3 posts, from the same user, fall within a time period of 1 minute, the poster may be spamming the forums. In turn, I would like to return user 'Spammer' in a query, with the number of posts made within the allotted time.

In the above example, Spammer posted 4 messages within a time period of 1 minute, so the result of the query may look like the following:

UserId | PostAuthor | PostCount | DateStart               | DateEnd
1000     Spammer      4           2010-11-14 02:52:50.093   2010-11-14 02:53:37.073

Any suggestions in the format of the returned data our welcome. The format doesn't matter me as much as correctly identifying forum abusers accurately.

like image 998
George Johnston Avatar asked Nov 15 '10 17:11

George Johnston


2 Answers

Doesn't have everything you wanted in the output, but it's a start:

(Reword: give me all posts for which 2 or more other posts exist after it, but within one minute)

Select 
  Spammer = PostAuthor,
  NumberOfPosts = (Select Count(*) 
                   From Posts As AllPosts 
                   Where AllPosts.UserID = Posts.UserID)
From Posts
Where 2 <= (Select Count(*)
            From Posts As OtherPosts
            Where OtherPosts.UserID = Posts.UserID
              And OtherPosts.PostDate > Posts.PostDate
              And OtherPosts.PostDate < DateAdd(Minute, 1, Posts.PostDate))
like image 123
Stu Avatar answered Nov 14 '22 13:11

Stu


Self join solution:

Select T1.UserId, T1.PostAuthor, T1.PostDate, Max(T2.PostDate), Count(*)
from
  Posts T1 INNER JOIN Posts T2 
  ON T1.UserId = T2.UserId and 
     T2.PostDate between T1.PostDate and dateadd(minute, 1, T1.PostDate)
group by T1.UserId, T1.PostAuthor, T1.PostDate
having count(*) >= 3
like image 21
Axn Avatar answered Nov 14 '22 11:11

Axn