MYSQL-TOOLS

Comparison of graphical tools for MySQL

Home Aggregate Functions and Grouping Aggregate Functions and Grouping

Aggregate Functions and Grouping

Print PDF
User Rating: / 2
PoorBest 

Grouping Data using GROUP BY in MySQL

One key feature supported by the SELECT is to find aggregate values, grouping data elements as needed.

<select statement>::=
SELECT ....
[GROUP BY <group by definition>]
[HAVING <expression> [{<operator> <expression>}...]]
]

<group by definition>::=
<column name> [ASC | DESC]
[{, <column name> [ASC | DESC]}...]
[WITH ROLLUP]

The films in sakila are assigned to various categories, ratings and are in different languages.

If you wish to determine how many films there are in each category, use GROUP BY category_id. Each row in the result set is made of the multiple entries with the same value, also called grouping.

Aggregate Functions in MySQL

Aggregate functions are used to calculate results using field values from multiple records. There are five common aggregate functions.

Common Aggregate Functions MySQL
Aggregate FunctionDescription
COUNT() Returns the number of rows containing non-NULL values in the specified field.
SUM() Returns the sum of the non-NULL values in the specified field.
AVG() Returns the average of the non-NULL values in the specified field.
MAX() Returns the maximum of the non-NULL values in the specified field.
MIN() Returns the minimum of the non-NULL values in the specified field.
SELECT customer_id, MAX(rental_date) 'LastRentalOn'
FROM rental
GROUP BY customer_id;

SELECT customer_id, MIN(payment_date) 'OldestPaymentOn'
FROM payment
GROUP BY customer_id;

SELECT customer_id, SUM(amount) 'PaidAmount'
FROM payment
GROUP BY customer_id;

SELECT l.name, COUNT(f.film_id) 'FilmCount'
FROM language l LEFT JOIN film_detail f USING (language_id)
GROUP BY f.language_id;

SELECT category_name, ROUND(AVG(length)/60,2) 'Average Length In Hours'
FROM film_detail
GROUP BY category_id;
Code Explanation

Show various aggregation functions usage.

Using Conditions in Grouping

Note: Conditions can be placed within an aggregate function, for example, to enable the consideration of only those values that satisfy a certain condition when you are using COUNT or SUM.

SELECT MONTH(rental_date) rental_month, SUM(IF(rating = 'PG',1,0)) PG_count,
 COUNT(rental_id) rental_count
FROM rental_detail
GROUP BY MONTH(rental_date)
ORDER BY 3 DESC, 2;
Code Explanation

Show count of rented films by rental month, also show count of 'PG' rated films.

You can also use CASE statement in your grouping.

SELECT r.category_name,
(CASE
WHEN return_date IS NULL THEN 'Still Out'
WHEN DATEDIFF(r.return_date, r.rental_date) <= r.rental_duration
THEN 'On Time'
ELSE 'Late'
END) AS status,
COUNT(r.rental_id) As rental_count
FROM rental_detail r
GROUP BY r.category_name,
(CASE
WHEN return_date IS NULL THEN 'Still Out'
WHEN DATEDIFF(r.return_date, r.rental_date) <= r.rental_duration
THEN 'On Time'
ELSE 'Late'
END)
ORDER BY 1, 2;
Code Explanation

Show count of rented films by film-category and timeliness of returns, whether late or on-time.

GROUP_CONCAT: Aggregate Functions

The aggregate function GROUP_CONCAT groups together character strings, as shown by the following example where actors in a film are arrayed for a film in alphabetical order.

SELECT f.title,
GROUP_CONCAT(CONCAT(a.first_name, _utf8' ', a.last_name) ORDER BY a.last_name,
a.first_name SEPARATOR ', ') AS actors
FROM film f JOIN film_actor fa USING(film_id)
JOIN actor a ON fa.actor_id = a.actor_id
WHERE title LIKE 'SU%'
GROUP BY f.film_id;
Code Explanation

Join film, film_actor and actor tables.

Concatenate actors for each film into one string, grouping by film.

Order actors by last_name, first_name.

Multiple Column GROUPING

GROUP BY can be used for multiple columns. The following query averages rental and length for rating and category:

SELECT category_name, rating,
ROUND(AVG(rental_rate),2) 'Average Rental',
ROUND(AVG(length),0) 'Average Length'
FROM film_detail
GROUP BY category_name, rating
ORDER BY category_name, rating;
Code Explanation

Show average length and rental rate by rating and category.

GROUP BY WITH ROLLUP

The key word WITH ROLLUP can be appended to GROUP BY column, where if GROUP BY groups only a single column, then an additional sum row is added, with the group name NULL:

SELECT category_name,
ROUND(AVG(rental_rate),2) 'Average Rental',
ROUND(AVG(length),0) 'Average Length'
FROM film_detail
GROUP BY category_name WITH ROLLUP;

SELECT category_name,
ROUND(AVG(rental_rate),2) 'Average Rental',
ROUND(AVG(length),0) 'Average Length'
FROM film_detail
WHERE category_name IS NOT NULL
GROUP BY category_name WITH ROLLUP;

SELECT category_name,
ROUND(AVG(rental_rate),2) 'Average Rental',
ROUND(AVG(length),0) 'Average Length'
FROM film_detail
WHERE category_name IN ('Children','Foreign')
GROUP BY category_name WITH ROLLUP;
Code Explanation

Show average length and rental rate by category with a Rollup.

Note there may be a NULL category, which can be filtered out via WHERE clause.

ORDER BY cannot be used with ROLLUP.

The last row is the rolled-up, summary row for all categories.

WHERE clause limits the result set and also affects the ROLLUP.

Filtering Aggregates using HAVING Clause

The HAVING clause is used to filter grouped data. For example, the following code specifies that we only want information on languages that have more than a certain number of films.

Code Sample:

SELECT l.name, COUNT(f.film_id)
FROM language l JOIN film_detail f USING (language_id)
GROUP BY f.language_id
HAVING COUNT(f.film_id) > 5
ORDER BY 2 DESC;
Code Explanation

Count films by language.

Limit result set to languages with more than 5 films.

The HAVING clause works similar to the WHERE clause in that it consists of one or more conditions that define which rows are included in a result set. You cannot use aggregate functions or column aliases in expressions in your WHERE clause for which we use HAVING clause.

In general, the HAVING clause is normally best suited to use in conjunction with the GROUP BY clause.

A HAVING clause is constructed exactly like a WHERE clause, in terms of defining conditions and connecting multiple conditions with operators. For example, the following SELECT statement includes a HAVING clause that contains one condition:

Code Sample:

SELECT category_name,
ROUND(AVG(rental_rate),2) 'Average Rental',
ROUND(AVG(length),0) 'Average Length'
FROM film_detail
GROUP BY category_name
HAVING AVG(length) > 120
ORDER BY category_name;
Code Explanation

Show average length and rental rate by category.

Limit result set to those having average length more than 2 hours.

You can use both WHERE and HAVING clauses in the same SELECT.

The following query shows use of Alias in HAVING clause.

Code Sample:

SELECT 
c.customer_id, c.last_name, c.first_name, cat.name,
COUNT(r.rental_id) AS total_rentals,
SUM(p.amount) AS total_sales
FROM payment AS p
INNER JOIN rental AS r ON p.rental_id = r.rental_id
INNER JOIN inventory AS i ON r.inventory_id = i.inventory_id
INNER JOIN film AS f ON i.film_id = f.film_id
INNER JOIN film_category AS fc ON f.film_id = fc.film_id
INNER JOIN category AS cat ON fc.category_id = cat.category_id
INNER JOIN customer AS c ON r.customer_id = c.customer_id
WHERE c.customer_id BETWEEN 100 AND 200
GROUP BY c.customer_id, cat.category_id
HAVING total_rentals > 3
ORDER BY total_sales DESC;
Code Explanation

Group payments and rental count by customer and category.

Limit result set to those having rental count more than 3, using an alias for the count.

Limit result set by customer ID between 100 and 200.

We are using both WHERE and HAVING clauses in the same SELECT.

Miscellaneous Grouping Concepts and General Constraints

Finding Top-N or Bottom-N Entities

It is desirable sometimes to find best or worst-case records, such as top paying customers and films that are not getting rented. The same is achieved using creative mix of Grouping, Ordering and Limits.

The following query finds the Top 5 paying customers.

Code Sample:

SELECT c.last_name, c.first_name, COUNT(1) AS 'RentalCount'
FROM rental r JOIN customer c USING (customer_id)
WHERE r.rental_date BETWEEN '20050601' AND '20050930'
GROUP BY r.customer_id
ORDER BY RentalCount DESC LIMIT 5;
Code Explanation

Group rental count by customer.

Limit result set to 5 top renting customers by rental count.

The third quarter of 2005 is used to filter data.

Find 5 least-rented films.

Code Sample:

SELECT f.title, f.rating, f.category_name, COUNT(1) AS 'RentalCount'
FROM rental r JOIN inventory i USING (inventory_id)
JOIN film_detail f USING (film_id)
GROUP BY f.film_id
ORDER BY RentalCount LIMIT 5;

Code Explanation

Group rental count by films.

Find lowest renting films.

Find oldest-rented 10 films.

Code Sample:

SELECT f.title, f.rating, f.category_name, MAX(rental_date) AS 'LastRentedOn'
FROM rental r JOIN inventory i USING (inventory_id)
JOIN film_detail f USING (film_id)
GROUP BY f.film_id
ORDER BY LastRentedOn LIMIT 10;

Code Explanation

Read max rental date by films.

Find 10 films whose rental date is the oldest, or first in ordered list.

Order of Clauses

As seen in the syntax, the clauses are expected to follow a certain order to be syntactically correct.

  1. SELECT
  2. FROM
  3. WHERE
  4. GROUP BY
  5. HAVING
  6. ORDER BY

Code Sample:

SELECT city, COUNT(customer_id) AS NumCustomers
FROM customer c JOIN address USING (address_id)
JOIN city USING (city_id)
WHERE city BETWEEN 'B' AND 'G'
GROUP BY city
HAVING COUNT(customer_id) > 1
ORDER BY NumCustomers;
Code Explanation

Group customer count by city.

Filter cities by name.

List cities with more than 1 living customer.

Grouping Rules

  • Normally, every non-aggregate column that appears in the SELECT clause must also appear in the GROUP BY clause. MySQL relaxes this rule by allowing one to SELECT columns that are not in GROUP BY and vice versa, as will be seen in many examples. There are arguments on both sides, but this is an advantageous feature of MySQL, which can be made stricter using SQL Modes discussed in another lesson.
  • Normally, you may not use aliases in the HAVING clause, but again we see MySQL's liberty at play here. It is very common to use aliases in the HAVING clause in MySQL queries.
  • You may use aliases or actual fields in the ORDER BY clause, in addition to column positions.
  • Normally, you may only use calculated fields in the HAVING clause, but MySQL's allows use of aliases which relaxes this control.
  • As we have seen, using ROLLUP is allowed with only one grouping column, and no ORDER BY can be used.

Aggregate Functions and Grouping Conclusion in MySQL

This lesson gave you the information necessary to perform the following tasks:

  • Use GROUP BY clauses to your SELECT statements to generate summary data
  • Use HAVING and other clauses to your SELECT statements to filter the results returned by summarized data
To continue to learn MySQL go to the top of this page and click on the next lesson in this MySQL Tutorial's Table of Contents.
 
mysql tools

How create data source in Visual Studio with data from My SQL tables with FK. more...

'mysql.proc' doesn't exist fixed problem

_

Table 'mysql.proc' doesn't exist. Fix error 1146 Table doesn't exist here...