Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimize a query or suggest LINQ equivalent

I have a table containing columns date_trans, time_trans, price. After select query, I want to add a new column "Count" which will be calculated as the consecutive equal values of price column and the previous rows having consecutive equal prices will be removed from the final result. See the expected output:

date_trans  time_trans  price   **Count**    
2011-02-22  09:39:59    58.02   1
2011-02-22  09:40:03    58.1    *ROW WILL BE REMOVED
2011-02-22  09:40:07    58.1    *ROW WILL BE REMOVED
2011-02-22  09:40:08    58.1    3
2011-02-22  09:40:10    58.15   1
2011-02-22  09:40:10    58.1    *ROW WILL BE REMOVED
2011-02-22  09:40:14    58.1    2
2011-02-22  09:40:24    58.15   1
2011-02-22  09:40:24    58.18   *ROW WILL BE REMOVED
2011-02-22  09:40:24    58.18   *ROW WILL BE REMOVED
2011-02-22  09:40:24    58.18   3
2011-02-22  09:40:24    58.15   1

Please suggest a sql query or LINQ expression to select from the table

Currently, I can do it be a select query and looping through all the selected rows but when selecting millions of rows it takes hours.

My current code:

    string query = @"SELECT date_trans, time_trans,  price
                            FROM tbl_data 
                         WHERE date_trans BETWEEN '2011-02-22' AND '2011-10-21'
                        AND time_trans BETWEEN '09:30:00' AND '16:00:00'";

            DataTable dt = oUtil.GetDataTable(query);

            DataColumn col = new DataColumn("Count", typeof(int));
            dt.Columns.Add(col);

            int priceCount = 1;
            for (int count = 0; count < dt.Rows.Count; count++)
            {
                double price = Convert.ToDouble(dt.Rows[count]["price"]);
                double priceNext = (count == dt.Rows.Count - 1) ? 0 : Convert.ToDouble(dt.Rows[count + 1]["price"]);
                if (price == priceNext)
                {
                    priceCount++;
                    dt.Rows.RemoveAt(count);
                    count--;
                }
                else
                {
                    dt.Rows[count]["Count"] = priceCount;
                    priceCount = 1;
                }
            }
like image 747
Mainuddin Avatar asked Jul 15 '12 14:07

Mainuddin


1 Answers

That's an interesting one. I think what you need would be something like this:

SELECT MAX(date_trans), MAX(time_trans), MAX(price), COUNT(*)
FROM
    (SELECT *, ROW_NUMBER() OVER(PARTITION BY price ORDER BY date_trans, time_trans) - ROW_NUMBER() OVER(ORDER BY date_trans, time_trans) AS grp
    FROM transactions) grps
GROUP BY grp

Found the solution here: http://www.sqlmag.com/article/sql-server/solution-to-the-t-sql-puzzle-grouping-consecutive-rows-with-a-common-element

UPDATE

The grouping column needs to also include "price", otherwise groups might not be unique. One more thing is that the date and time column should be combined into a datetime column so the max datetime value is correct in groups that start near the end of one day and end on the beginning of the next. Here's the corrected query.

SELECT MAX(CAST(date_trans AS DATETIME) + CAST(time_trans AS DATETIME)) , MAX(price), COUNT(*)
FROM
    (SELECT *, 
        CAST(ROW_NUMBER() OVER(PARTITION BY price ORDER BY date_trans, time_trans) - ROW_NUMBER() OVER(ORDER BY date_trans, time_trans) AS NVARCHAR(255)) + '-' + CAST(price AS NVARCHAR(255)) AS grp
    FROM transactions
    ORDER BY date_trans, time_trans) grps
GROUP BY grp

The query might be more optimal with the 'grp' column as a byte array or bigint instead of a nvarchar. Also you mentioned a 'volume' column that you probably want to sum within the group.

like image 95
Paweł Sokołowski Avatar answered Oct 04 '22 08:10

Paweł Sokołowski