Skip to content

Aggregation

Count, sum, and group data with COUNT, SUM, AVG, GROUP BY, and HAVING.

Sometimes you do not want to see every row. You want summaries: how many students are in each course? What is the average score? Aggregation functions turn many rows into a single summary value.

Definition

Aggregate functions compute a single value from a set of rows. COUNT() counts rows, SUM() adds values, AVG() computes the mean, and MAX()/MIN() find extremes. GROUP BY groups rows before aggregation.

Required step: If you do not have the school database and students table yet, run this setup in the playground. The FOREIGN KEY line links enrollments to students; it is formally explained in Stage 8.

CREATE DATABASE IF NOT EXISTS school;
USE school;

CREATE TABLE IF NOT EXISTS students (
  id INT AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(100) NOT NULL
);

CREATE TABLE IF NOT EXISTS enrollments (
  id INT AUTO_INCREMENT PRIMARY KEY,
  student_id INT NOT NULL,
  course_name VARCHAR(100) NOT NULL,
  score INT,
  FOREIGN KEY (student_id) REFERENCES students(id)
);

INSERT INTO students (name) VALUES ('Alice'), ('Bob'), ('Charlie');

INSERT INTO enrollments (student_id, course_name, score) VALUES
  (1, 'Math', 90), (1, 'Science', 85),
  (2, 'Math', 78), (2, 'Science', NULL),
  (3, 'Math', 92);

Troubleshooting: If you get "table already exists" errors, this setup uses IF NOT EXISTS so it is safe to re-run. If you get "table doesn't exist" errors on other queries, re-run the setup block above.

Aggregate Functions

MySQL provides functions that operate on a set of rows and return a single value.

COUNT: how many rows

SELECT COUNT(*) AS total_enrollments
FROM enrollments;

COUNT(*) counts every row regardless of column values or NULLs. The result is 5.

COUNT with a column: skips NULLs

SELECT COUNT(score) AS scored_enrollments
FROM enrollments;

COUNT(column) counts only non-NULL values. Bob's Science score is NULL, so the result is 4, not 5.

SUM, AVG, MIN, MAX

SELECT
  SUM(score) AS total,
  AVG(score) AS average,
  MIN(score) AS lowest,
  MAX(score) AS highest
FROM enrollments;

SUM adds all values. AVG calculates the average (ignoring NULLs). MIN and MAX return the extreme values. All ignore NULL values.

GROUP BY

GROUP BY splits rows into groups before applying aggregate functions. Each group becomes one row in the result.

Students per course

SELECT course_name, COUNT(*) AS student_count
FROM enrollments
GROUP BY course_name;

GROUP BY course_name creates one group per course. COUNT(*) counts rows in each group. Math has 3 enrollments, Science has 2.

Average score per course

SELECT course_name, AVG(score) AS avg_score
FROM enrollments
GROUP BY course_name;

AVG(score) is calculated separately for each group. NULL scores are ignored in the average.

Important:GROUP BY Rule

Every non-aggregated column in SELECT must appear in GROUP BY. For example, this query will error:

-- WRONG: student_id is not in GROUP BY
SELECT student_id, course_name, COUNT(*)
FROM enrollments
GROUP BY course_name;

This is correct:

SELECT student_id, course_name, COUNT(*)
FROM enrollments
GROUP BY student_id, course_name;

MySQL's default mode (ONLY_FULL_GROUP_BY) enforces this rule. It prevents ambiguous results where the database would have to guess which value to return for the non-aggregated column.

HAVING

HAVING filters groups after aggregation. WHERE filters rows before grouping.

Courses with more than 2 enrollments

SELECT course_name, COUNT(*) AS student_count
FROM enrollments
GROUP BY course_name
HAVING COUNT(*) > 2;

HAVING filters after GROUP BY. Only courses with more than 2 enrollments appear. Math has 3, so it matches. Science has 2, so it is excluded.

WHERE vs HAVING

SELECT course_name, AVG(score) AS avg_score
FROM enrollments
WHERE score IS NOT NULL
GROUP BY course_name
HAVING AVG(score) > 80;

WHERE filters rows before grouping (removes NULL scores). HAVING filters groups after aggregation (only courses with avg > 80). Both can be used together.

GROUP BY WITH ROLLUP

WITH ROLLUP adds extra rows to a GROUP BY result: subtotal rows for each group and a grand total row at the end.

Subtotals and grand total

SELECT course_name, COUNT(*) AS student_count
FROM enrollments
GROUP BY course_name WITH ROLLUP;

The result includes one row per course plus a final row with course_name = NULL and the total count of all enrollments (5). The NULL in the summary row means 'all groups combined'.

ROLLUP with SUM

SELECT course_name, SUM(score) AS total_score
FROM enrollments
WHERE score IS NOT NULL
GROUP BY course_name WITH ROLLUP;

You get per-course subtotals plus a grand total row. The grand total row has NULL for course_name and the sum of all scores across all courses.

What is the difference between COUNT(name) and COUNT(*)?

When does HAVING filter rows?

Key Takeaways

  • COUNT(*) counts all rows; COUNT(column) counts non-NULL values.
  • SUM, AVG, MIN, MAX all ignore NULL values.
  • GROUP BY splits rows into groups before aggregation.
  • HAVING filters groups after aggregation; WHERE filters rows before.
  • Every non-aggregated column in SELECT must appear in GROUP BY.

Ready to test your knowledge?

Take a Quiz