I need to read out users from an old database to use for statistics in a new system, but I do not have the original user table. However there is a statistics table with the sum of each year, where I also can find all needed user information. In addition this also only give me the users that were active, which is what I need.
The table has the following relevant columns: (the statistics columns are not relevant here)
I want UserID to be distinct, so it is the only column that I can have in GROUP BY. I will run MAX on Year, to get the values from the most recent year. FirstName, LastName and Email need to be the same as the row where MAX(Year). In other words people may have changed both names and emails during the years, and I only want the last one, as it's the only one that is relevant.
My best suggestion for a SQL query goes like this:
SELECT UserID, Firstname, LastName, Email, MAX(Year) AS Year
FROM myTable
GROUP BY UserID
ORDER BY LastName, FirstName
The only problem is that SQL Server 2008 will not let me do anyhing like that, because all columns either has to be with a function like MAX or a part of GROUP BY. The columns FirstName, LastName and Email can't be under GROUP BY because that will make too many records. It seems somehow to work to put MAX on all of them, but then I have no way of knowing which column the MAX function is actually working on. I do not know for sure that it will be a problem, but I don't have time to look through 100 000 rows to see if there actually is a problem.
So in short I want the whole row of five columns where MAX works on only one column and GROUP BY on another. Does anybody have a good solution, or is it actually safe to use MAX on all non-grouping rows?
Max() function with Group by In this page we are discussing, how the GROUP BY clause along with the SQL MAX() can be used to find the maximum value of a column over each group.
The MySQL Solution If you're working with MySQL, you can combine MAX() with the GREATEST() function to get the biggest value from two or more fields. Here's the syntax for GREATEST: GREATEST(value1,value2,...) Given two or more arguments, it returns the largest (maximum-valued) argument.
To do that, you can use the ROW_NUMBER() function. In OVER() , you specify the groups into which the rows should be divided ( PARTITION BY ) and the order in which the numbers should be assigned to the rows ( ORDER BY ). You assign the row numbers within each group (i.e., year).
The GROUP BY clause is used along with some aggregate functions to group columns with the same values in different rows. The group by multiple columns technique retrieves grouped column values from one or more database tables by considering more than one column as grouping criteria.
A few answers...
Correlated sub-query...
SELECT
*
FROM
myTable
WHERE
Year = (SELECT MAX(Year) FROM myTable AS lookup WHERE lookup.UserID = myTable.UserID)
Join on derived aggregate...
SELECT
*
FROM
myTable
INNER JOIN
(SELECT UserID, MAX(Year) AS Year FROM myTable GROUP BY UserID) AS lookup
ON lookup.UserID = myTable.UserID
AND lookup.Year = myTable.Year
Ordered CTE using ROW_NUMBER()...
WITH
sequenced_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY Year DESC) AS sequence_id,
*
FROM
myTable
)
SELECT
*
FROM
sequenced_data
WHERE
sequence_id = 1
Dy you have only one year record per user? If yes, then you can use old'n'good join:
SELECT m.UserID, m.Firstname, m.LastName, m.Email, m.Year
FROM myTable m
INNER JOIN (
SELECT UserID, MAX(Year) as Year
FROM myTable
GROUP BY UserID
) x ON x.UserID=m.UserID and x.Year=m.Year
ORDER BY m.LastName, m.FirstName
Sure you can use constructs from newer SQL versions, I just have got used to older (=more generic) possibilities :).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With