I am somewhat confused about how the group by
command works in mysql.
Suppose I have a table:
mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort;
+----------+-----------------+---------------------+-------------------------------------------------+
| recordID | IPAddress | date | httpMethod |
+----------+-----------------+---------------------+-------------------------------------------------+
| 1 | 64.68.88.22 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 |
| 2 | 64.68.88.166 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 |
| 3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1 |
| 4 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/fcs_style.css HTTP/1.1 |
| 5 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/main_page.css HTTP/1.1 |
| 6 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/bigportaltopbanner.gif HTTP/1.1 |
| 7 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/right_1.jpg HTTP/1.1 |
| 8 | 64.68.88.165 | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 |
| 9 | 64.68.88.165 | 2003-07-09 00:02:44 | GET /news/sports/basketball.shtml HTTP/1.0 |
| 10 | 64.68.88.34 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 |
| 11 | 129.173.159.98 | 2003-07-09 00:03:46 | GET / HTTP/1.1 |
| 12 | 129.173.159.98 | 2003-07-09 00:03:46 | GET /include/fcs_style.css HTTP/1.1 |
| 13 | 129.173.159.98 | 2003-07-09 00:03:46 | GET /include/main_page.css HTTP/1.1 |
| 14 | 129.173.159.98 | 2003-07-09 00:03:48 | GET /images/bigportaltopbanner.gif HTTP/1.1 |
| 15 | 129.173.159.98 | 2003-07-09 00:03:48 | GET /images/left_1g.jpg HTTP/1.1 |
| 16 | 129.173.159.98 | 2003-07-09 00:03:48 | GET /images/webcam.gif HTTP/1.1 |
+----------+-----------------+---------------------+-------------------------------------------------+
When I am execute this statement how does it choose which recordID
to include since there are a range of recordID
s that would be correct? Does it just choose the first one that matches?
mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
+----------+-----------------+---------------------+-------------------------------------------------+
| recordID | IPAddress | date | httpMethod |
+----------+-----------------+---------------------+-------------------------------------------------+
| 11 | 129.173.159.98 | 2003-07-09 00:03:46 | GET / HTTP/1.1 |
| 3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1 |
| 8 | 64.68.88.165 | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 |
| 2 | 64.68.88.166 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 |
| 1 | 64.68.88.22 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 |
| 10 | 64.68.88.34 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 |
+----------+-----------------+---------------------+-------------------------------------------------+
6 rows in set (0.00 sec)
For this table the max(date)
and min(date)
values seem logical to me but I am confused about how the recordID
and httpMethod
where chosen.
Is it safe use two aggregate functions in one command?
mysql> select recordID, IPAddress, min(date), max(date), httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
| recordID | IPAddress | min(date) | max(date) | httpMethod |
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
| 11 | 129.173.159.98 | 2003-07-09 00:03:46 | 2003-07-09 00:03:48 | GET / HTTP/1.1 |
| 3 | 129.173.177.214 | 2003-07-09 00:01:23 | 2003-07-09 00:01:23 | GET / HTTP/1.1 |
| 8 | 64.68.88.165 | 2003-07-09 00:02:43 | 2003-07-09 00:02:44 | GET /studentservices/responsible.shtml HTTP/1.0 |
| 2 | 64.68.88.166 | 2003-07-09 00:00:55 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 |
| 1 | 64.68.88.22 | 2003-07-09 00:00:21 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 |
| 10 | 64.68.88.34 | 2003-07-09 00:02:46 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 |
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
6 rows in set (0.00 sec)
You can use the GROUP BY clause without applying an aggregate function. The following query gets data from the payment table and groups the result by customer id. In this case, the GROUP BY works like the DISTINCT clause that removes duplicate rows from the result set.
If you don't specify GROUP BY , aggregate functions operate over all the records selected. In that case, it doesn't make sense to also select a specific column like EmployeeID .
Expressions that are not encapsulated within an aggregate function and must be included in the GROUP BY Clause at the end of the SQL statement. This is an aggregate function such as the SUM, COUNT, MIN, MAX, or AVG functions. This is the column or expression that the aggregate_function will be used on.
If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them.
Usually use of GROUP BY while listing a field in the select expression without an aggregate function is invalid SQL and should throw an error.
MySQL, however, allows this and simply chooses one value randomly. Try to avoid it, because it is confusing.
To disallow this, you can say at runtime:
SET sql_mode := CONCAT('ONLY_FULL_GROUP_BY,',@@sql_mode);
or use the configuration value and/or command line option sql-mode
.
Yes, listing two aggregate functions is completely valid.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With