I have a table containing columns date_trans, time_trans, price. After select query, I want to add a new column "Count" which will be calculated as the consecutive equal values of price column and the previous rows having consecutive equal prices will be removed from the final result. See the expected output:
date_trans time_trans price **Count**
2011-02-22 09:39:59 58.02 1
2011-02-22 09:40:03 58.1 *ROW WILL BE REMOVED
2011-02-22 09:40:07 58.1 *ROW WILL BE REMOVED
2011-02-22 09:40:08 58.1 3
2011-02-22 09:40:10 58.15 1
2011-02-22 09:40:10 58.1 *ROW WILL BE REMOVED
2011-02-22 09:40:14 58.1 2
2011-02-22 09:40:24 58.15 1
2011-02-22 09:40:24 58.18 *ROW WILL BE REMOVED
2011-02-22 09:40:24 58.18 *ROW WILL BE REMOVED
2011-02-22 09:40:24 58.18 3
2011-02-22 09:40:24 58.15 1
Please suggest a sql query or LINQ expression to select from the table
Currently, I can do it be a select query and looping through all the selected rows but when selecting millions of rows it takes hours.
My current code:
string query = @"SELECT date_trans, time_trans, price
FROM tbl_data
WHERE date_trans BETWEEN '2011-02-22' AND '2011-10-21'
AND time_trans BETWEEN '09:30:00' AND '16:00:00'";
DataTable dt = oUtil.GetDataTable(query);
DataColumn col = new DataColumn("Count", typeof(int));
dt.Columns.Add(col);
int priceCount = 1;
for (int count = 0; count < dt.Rows.Count; count++)
{
double price = Convert.ToDouble(dt.Rows[count]["price"]);
double priceNext = (count == dt.Rows.Count - 1) ? 0 : Convert.ToDouble(dt.Rows[count + 1]["price"]);
if (price == priceNext)
{
priceCount++;
dt.Rows.RemoveAt(count);
count--;
}
else
{
dt.Rows[count]["Count"] = priceCount;
priceCount = 1;
}
}
That's an interesting one. I think what you need would be something like this:
SELECT MAX(date_trans), MAX(time_trans), MAX(price), COUNT(*)
FROM
(SELECT *, ROW_NUMBER() OVER(PARTITION BY price ORDER BY date_trans, time_trans) - ROW_NUMBER() OVER(ORDER BY date_trans, time_trans) AS grp
FROM transactions) grps
GROUP BY grp
Found the solution here: http://www.sqlmag.com/article/sql-server/solution-to-the-t-sql-puzzle-grouping-consecutive-rows-with-a-common-element
UPDATE
The grouping column needs to also include "price", otherwise groups might not be unique. One more thing is that the date and time column should be combined into a datetime column so the max datetime value is correct in groups that start near the end of one day and end on the beginning of the next. Here's the corrected query.
SELECT MAX(CAST(date_trans AS DATETIME) + CAST(time_trans AS DATETIME)) , MAX(price), COUNT(*)
FROM
(SELECT *,
CAST(ROW_NUMBER() OVER(PARTITION BY price ORDER BY date_trans, time_trans) - ROW_NUMBER() OVER(ORDER BY date_trans, time_trans) AS NVARCHAR(255)) + '-' + CAST(price AS NVARCHAR(255)) AS grp
FROM transactions
ORDER BY date_trans, time_trans) grps
GROUP BY grp
The query might be more optimal with the 'grp' column as a byte array or bigint instead of a nvarchar. Also you mentioned a 'volume' column that you probably want to sum within the group.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With