Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check for winning tickets in lottery using SQL

I have a SQL efficiency question. This is concerning the Norwegian national lottery. They draw seven numbers and three bonus balls.

I have a database with all the drawings and a lot of tickets. The question is what is the most efficient table structure and way of getting all the winning tickets in a draw.

These are my two main tables:

LotteryDraw
   DrawId (int, PK)
   DrawDate (datetime)
   MainNumbers (varchar)
   BonusNumbers (varchar)
   Main1 (smallint)
   Main2 (smallint)
   Main3 (smallint)
   Main4 (smallint)
   Main5 (smallint)
   Main6 (smallint)
   Main7 (smallint)
   Bonus1 (smallint)
   Bonus2 (smallint)
   Bonus3 (smallint)

I store each of the main- and bonus numbers both separately as well as a comma separated string in sorted order.

Similary I've got:

LotteryTicket
   TicketId (int, PK)
   UserId (int, FK)
   ValidTill (datetime)
   MainNumbers (varchar)
   Main1 (smallint)
   Main2 (smallint)
   Main3 (smallint)
   Main4 (smallint)
   Main5 (smallint)
   Main6 (smallint)
   Main7 (smallint)

You get prizes for 4+1, 5, 6, 6+1 and 7 correct numbers (correct main numbers + bonus numbers). Anyone have any great ideas on how to write efficient SQL that will return all LotteryTickets with a prize for a give draw date? ValidTill is the last draw date where a ticket was valid.

My current attempt is using Linq2Sql in C# and has the speed of a hippo on ice so I really need some SQL expertise.

Server is Microsoft SQL Server 2008 R2 if that matters.

Update: After tweaking the answer from Mark B. I ended up with the following query. I needed to normalize the database a bit by adding a new table LotteryTicketNumber (ticketid, number).

SELECT LotteryTicket.TicketID, count(LotteryTicket.Numbers) AS MainBalls, (
    SELECT top 1 ltn.Number
    FROM LotteryTicketNumber ltn
    WHERE ltn.Number IN (2,4,6)
    AND ltn.TicketId = LotteryTicket.TicketId
) As BonusBall
FROM LotteryTicket
LEFT JOIN LotteryTicketNumber ON LotteryTicket.TicketId = LotteryTicketNumber.TicketId
WHERE LotteryTicketNumber.Number IN (13,14,16,23,26,27,30)
GROUP BY LotteryTicket.TicketID
HAVING count(LotteryTicketNumber.Number) >= 4

The above query returns all tickets with at least 4 correct main numbers. Also the field Bonusball != NULL if the same ticket has one or more bonus balls. This is sufficient for me.

Thanks for the help

like image 684
Paaland Avatar asked Mar 28 '11 18:03

Paaland


2 Answers

If you're willing to normalize the data by splitting the list of numbers into a sub-table, then you could trivially determine winners with something like:

SELECT LotteryTicket.TicketID, GROUP_CONCAT(LotteryTicketNumbers.number), COUNT(LotteryTicketNumbers.number) AS cnt
FROM LotteryTicket
LEFT JOIN LotterYTicketNumbers ON (LotteryTicketNumbers.number IN (winning, numbers, here))
GROUP BY LotteryTicket.TicketID
HAVING cnt >= 3;

where the '3' represents the mininum number of matched numbers required to win any prize. This won't handle "bonus" numbers, if there's any, though you could repeat the same query and flag any draws where the bonus number is present with a derived field.

Note that this isn't tested, just going off the top of my head, so probably has some syntax errors.


comment followup:

GROUP_CONCAT is a mysql-specific sql extension. You can rip that out since it would seem you're on SQLserver.

The 'LottoTicketNumbers' is what you'd use to normalize your tables. Instead of a single monolitic "ticket" record, you split it into two tables:

LottoTicket:  ticketID, drawDate
LottoTicketNumbers: ticketID, drawNumber

So let's say you had a ticket for the Apr 1/2011 draw, with numbers 1,12,23,44,55, you'd end up with something like:

LottoTicket: ticketID = 1, drawDate = Apr 1/2011
LottoTicketNumbers: (1,1), (1,12), (1,23), (1,44), (1,55)

Structuring your tables like this makes the query work, using some basic set theory and the power of a relational database. The original table structure makes it nearly impossible to do the comparisons necessary to figure out all the possible permutations of winning numbers, you'd end up some hideous construct like

select ...
where (number1 in (winning, numbers here), number2 in (winning, numbers, here), number3 in (winning, numbers,here), etc....

and wouldn't tell you exactly which prize you'd won (matched 3, matched 5 + bonus, etc...).

Example query results:

Let's say the draw numbers are 10,20,30,40,50, and you've got a ticket with 10,20,30,42,53. You've matched 3 of the 5 draw numbers, and win $10. Using the normalized table structure above, you'd have tables like:

LottoTicket: id #203, drawDate: Apr 1/2011
LottoTicketNumbers: (203, 10), (203, 20), (203, 30), (203, 42), (203, 53)

And the query would be

SELECT LottoTicket.TicketID, COUNT(LottoTicketNumbers.number) AS cnt
FROM LottoTicket
LEFT JOIN LottoTicketNumbers ON (LottoTicketNumbers.number IN (10,20,30,40,50))
GROUP BY LottoTicket.TicketID
HAVING CNT >= 3

You'd get (ungrouped) results of

203, 10
203, 20
203, 30

and with the grouping/aggregate functions:

203, 3   // ticket #203 matched 3 numbers.
like image 80
Marc B Avatar answered Oct 20 '22 07:10

Marc B


I am not a database expert but I think I came up with an somewhat elegant solution that does not require restructuring the data into another table. If you use a pivot table you can get SQL to return the proper counts for each number.

First the pivot table (don't name pivot because it causes a MS SQL Server error in the query). It is simply a table with one column of type int, primary key). It holds data that has a row from 1 to 100. You only really need as many numbers as your highest lottery number. More is OK.

PVT Structure: i(int,primary key)

PVT Data: (1) (2) (3) .... (100)

I am doing this example for Florida Lottery 6 numbers, no powerball, 53 numbers.

You have a LotteryTicket table, something like

LotteryTicket: ID, Number, N1, N2, N3, N4, N5, N6

SampleData:

(1), (1-2-3-4-5-6), (1), (2), (3), (4), (5), (6)

(2), (1-2-3-15-18-52), (1), (2), (3), (15), (18), (52)

Query/Stored procedure: [pass in a winning lottery number like: 1-2-3-20-30-33 or leave default params (this example)]

MatchFloridaLottery
    (
        @p1 int = 1,
        @p2 int = 2,
        @p3 int = 3, 
        @p4 int = 4,
        @p5 int = 5,
        @p6 int = 6,
        @minmatches int = 2 
    )

AS



SELECT t.id, COUNT(p.i) numbermatch
FROM LotteryTicket t, pvt p
WHERE 
(n1 IN (@p1,@p2,@p3,@P4,@p5,@p6) AND t.n1=p.i)
or 
(n2 IN (@p1,@p2,@p3,@P4,@p5,@p6) AND t.n2=p.i)
or 
(n3 IN (@p1,@p2,@p3,@P4,@p5,@p6) AND t.n3=p.i)
or 
(n4 IN (@p1,@p2,@p3,@P4,@p5,@p6) AND t.n4=p.i)
or 
(n5 IN (@p1,@p2,@p3,@P4,@p5,@p6) AND t.n5=p.i)
or 
(n6 IN (@p1,@p2,@p3,@P4,@p5,@p6) AND t.n6=p.i)

group by n.id
HAVING COUNT(p.i) > @minmatches

For my example in LotteryTickets I get:

ID     NumberMatch (count of numbers that matched)

1           6

2           3

The pivot table allows the query to return a row for each column that matches a winning number which you then group together by id and count the total rows returned by the pivot table (column i) which is the total number of matches to the winning number. Yes the query is not real pretty but it works and avoids having to do all the work of a separate table and rows. Modify as needed for different games.

like image 32
Frank B Avatar answered Oct 20 '22 07:10

Frank B