SELECT Queries - Interview Prep Notes

The SELECT statement is the most frequently used SQL command. It retrieves data from one or more tables and supports filtering, sorting, grouping, and nesting.

Sample data

All examples below use this employees table:

CREATE TABLE employees (
  id         SERIAL PRIMARY KEY,
  first_name VARCHAR(50),
  last_name  VARCHAR(50),
  email      VARCHAR(100),
  department VARCHAR(50),
  salary     NUMERIC(10, 2),
  age        INT,
  hire_date  DATE
);

INSERT INTO employees (first_name, last_name, email, department, salary, age, hire_date)
VALUES
  ('John',  'Doe',     'john@example.com',  'Engineering', 95000,  28, '2021-03-15'),
  ('Jane',  'Smith',   'jane@example.com',  'Engineering', 105000, 32, '2019-07-01'),
  ('Bob',   'Wilson',  'bob@example.com',   'Marketing',   72000,  45, '2018-01-20'),
  ('Alice', 'Johnson', 'alice@example.com', 'Design',      78000,  29, '2022-06-10'),
  ('Tom',   'Lee',     'tom@example.com',   'Engineering', 115000, 40, '2017-11-05'),
  ('Eve',   'Brown',   'eve@example.com',   'Marketing',   68000,  35, '2020-09-12'),
  ('Mike',  'Davis',   'mike@example.com',  'Design',      82000,  31, '2021-01-08'),
  ('Sara',  'Taylor',  'sara@example.com',  'Engineering', 98000,  27, '2023-02-14');

Basic SELECT

-- Select all columns
SELECT * FROM employees;

-- Select specific columns
SELECT first_name, last_name, salary FROM employees;

◆ Avoid SELECT * in production

SELECT * fetches every column, which wastes bandwidth and breaks if the schema changes. Always list the columns you need explicitly, especially in application code.

Column aliases (AS)

SELECT
  first_name AS "First Name",
  last_name AS "Last Name",
  salary AS annual_salary
FROM employees;

The AS keyword is optional — salary annual_salary works too — but using it makes queries more readable.

DISTINCT

Remove duplicate values from the result set.

-- Unique departments
SELECT DISTINCT department FROM employees;

-- DISTINCT on multiple columns — unique combinations
SELECT DISTINCT department, age FROM employees;

DISTINCT ON (PostgreSQL-specific)

Return one row per distinct value of the specified column(s).

-- Highest-paid employee per department
SELECT DISTINCT ON (department)
  department, first_name, salary
FROM employees
ORDER BY department, salary DESC;

WHERE — filtering rows

SELECT * FROM employees WHERE department = 'Engineering';
SELECT * FROM employees WHERE salary > 80000;
SELECT * FROM employees WHERE age != 28;

Comparison operators

Operator	Meaning
`=`	Equal to
`!=` or `<>`	Not equal to
`>`, `<`	Greater / less than
`>=`, `<=`	Greater or equal / less or equal
`BETWEEN a AND b`	Inclusive range
`IN (...)`	Matches any value in list
`LIKE` / `ILIKE`	Pattern matching
`IS NULL` / `IS NOT NULL`	Null checks

AND, OR, NOT

-- Multiple conditions
SELECT * FROM employees
WHERE department = 'Engineering' AND salary > 100000;

SELECT * FROM employees
WHERE department = 'Engineering' OR department = 'Design';

-- NOT
SELECT * FROM employees
WHERE department NOT IN ('Marketing', 'Sales');

BETWEEN

-- Salary between 70000 and 90000 (inclusive)
SELECT first_name, salary
FROM employees
WHERE salary BETWEEN 70000 AND 90000;

-- Date range
SELECT first_name, hire_date
FROM employees
WHERE hire_date BETWEEN '2020-01-01' AND '2022-12-31';

IN

SELECT * FROM employees
WHERE department IN ('Engineering', 'Design');

LIKE & ILIKE

-- Names starting with 'J'
SELECT * FROM employees WHERE first_name LIKE 'J%';

-- Names ending with 'son'
SELECT * FROM employees WHERE last_name LIKE '%son';

-- Exactly 3-letter first names
SELECT * FROM employees WHERE first_name LIKE '___';

-- Case-insensitive (PostgreSQL-specific)
SELECT * FROM employees WHERE first_name ILIKE 'john';

Pattern	Meaning
`%`	Any sequence of characters (including none)
`_`	Exactly one character

IS NULL / IS NOT NULL

SELECT * FROM employees WHERE department IS NULL;
SELECT * FROM employees WHERE department IS NOT NULL;

◆ NULL is not a value

NULL means unknown or missing. You cannot compare it with = — WHERE department = NULL always returns no rows. Always use IS NULL or IS NOT NULL.

ORDER BY — sorting results

-- Ascending (default)
SELECT * FROM employees ORDER BY salary;
SELECT * FROM employees ORDER BY salary ASC;

-- Descending
SELECT * FROM employees ORDER BY salary DESC;

-- Multiple sort columns
SELECT * FROM employees
ORDER BY department ASC, salary DESC;

-- Sort by column position (not recommended)
SELECT first_name, salary FROM employees ORDER BY 2 DESC;

-- NULLs first or last
SELECT * FROM employees ORDER BY department NULLS LAST;

LIMIT & OFFSET — pagination

-- Top 3 highest-paid
SELECT first_name, salary
FROM employees
ORDER BY salary DESC
LIMIT 3;

-- Skip first 3, get next 3 (page 2)
SELECT first_name, salary
FROM employees
ORDER BY salary DESC
LIMIT 3 OFFSET 3;

💡 Interview Tip

OFFSET-based pagination gets slower on large tables because the database still scans and discards the skipped rows. For production apps, prefer cursor-based pagination using WHERE id > last_seen_id ORDER BY id LIMIT n.

Aggregate functions

Function	Description
`COUNT(*)`	Total number of rows
`COUNT(col)`	Non-NULL values in column
`SUM(col)`	Sum of values
`AVG(col)`	Average of values
`MIN(col)`	Minimum value
`MAX(col)`	Maximum value

SELECT
  COUNT(*) AS total_employees,
  AVG(salary) AS avg_salary,
  MAX(salary) AS max_salary,
  MIN(salary) AS min_salary,
  SUM(salary) AS total_payroll
FROM employees;

 total_employees |  avg_salary   | max_salary | min_salary | total_payroll
-----------------+---------------+------------+------------+---------------
               8 | 89125.000000  |  115000.00 |   68000.00 |     713000.00

GROUP BY

Group rows that share a value and apply aggregate functions to each group.

-- Employee count and average salary per department
SELECT
  department,
  COUNT(*) AS employee_count,
  ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department;

 department  | employee_count | avg_salary
-------------+----------------+-----------
 Design      |              2 |  80000.00
 Engineering |              4 | 103250.00
 Marketing   |              2 |  70000.00

Rules

Every column in the SELECT that is not inside an aggregate function must appear in GROUP BY.
You cannot use column aliases defined in SELECT inside GROUP BY (PostgreSQL does allow this, but it’s non-standard).

HAVING — filtering groups

HAVING filters groups after aggregation, whereas WHERE filters rows before aggregation.

-- Departments with average salary above 75000
SELECT
  department,
  ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 75000;

 department  | avg_salary
-------------+-----------
 Design      |  80000.00
 Engineering | 103250.00

WHERE vs HAVING

-- WHERE filters rows BEFORE grouping
-- HAVING filters groups AFTER grouping
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE age > 25             -- filter individual rows first
GROUP BY department
HAVING AVG(salary) > 75000 -- then filter the groups
ORDER BY avg_salary DESC;

◆ SQL Execution Order

SQL doesn’t run top to bottom. The logical execution order is: FROM → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT

Subqueries

A query nested inside another query.

Subquery in WHERE

-- Employees earning above average
SELECT first_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Subquery with IN

-- Employees in departments that have more than 2 people
SELECT first_name, department
FROM employees
WHERE department IN (
  SELECT department
  FROM employees
  GROUP BY department
  HAVING COUNT(*) > 2
);

Subquery in FROM (derived table)

-- Department stats as a derived table
SELECT sub.department, sub.avg_salary
FROM (
  SELECT department, ROUND(AVG(salary), 2) AS avg_salary
  FROM employees
  GROUP BY department
) AS sub
WHERE sub.avg_salary > 75000;

Correlated subquery

References a column from the outer query — runs once per outer row.

-- Employees who earn more than the average of their own department
SELECT first_name, department, salary
FROM employees e1
WHERE salary > (
  SELECT AVG(salary)
  FROM employees e2
  WHERE e2.department = e1.department
);

EXISTS

-- Departments that have at least one employee earning > 100k
SELECT DISTINCT department
FROM employees e
WHERE EXISTS (
  SELECT 1 FROM employees
  WHERE department = e.department AND salary > 100000
);

💡 Interview Tip

EXISTS is often faster than IN for correlated subqueries because it stops scanning as soon as it finds the first match (short-circuit evaluation). Use EXISTS when checking for existence and IN when comparing against a small, known set of values.

CTEs — Common Table Expressions

A CTE (using WITH) creates a named temporary result set that makes complex queries readable.

WITH dept_stats AS (
  SELECT
    department,
    COUNT(*) AS emp_count,
    ROUND(AVG(salary), 2) AS avg_salary,
    MAX(salary) AS max_salary
  FROM employees
  GROUP BY department
)
SELECT *
FROM dept_stats
WHERE emp_count > 1
ORDER BY avg_salary DESC;

Multiple CTEs

WITH
  high_earners AS (
    SELECT * FROM employees WHERE salary > 90000
  ),
  dept_summary AS (
    SELECT department, COUNT(*) AS high_earner_count
    FROM high_earners
    GROUP BY department
  )
SELECT * FROM dept_summary ORDER BY high_earner_count DESC;

Recursive CTE

Useful for hierarchical/tree data (org charts, categories, graphs).

-- Assume employees table has a manager_id column
WITH RECURSIVE org_chart AS (
  -- Base case: top-level managers (no manager)
  SELECT id, first_name, last_name, manager_id, 1 AS level
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive case: employees under each manager
  SELECT e.id, e.first_name, e.last_name, e.manager_id, oc.level + 1
  FROM employees e
  JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY level, last_name;

◆ CTE vs Subquery

CTEs improve readability by naming intermediate results. In PostgreSQL 12+, CTEs can be inlined by the optimizer (treated like subqueries) unless you add MATERIALIZED to force them to compute once. Before PG12, CTEs were always materialized (acted as optimization barriers).

Common interview questions

What is the difference between WHERE and HAVING?

WHERE filters individual rows before any grouping happens. HAVING filters groups after GROUP BY and aggregation. You cannot use aggregate functions in WHERE, but you can in HAVING.

What is the logical execution order of a SELECT statement?

FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT/OFFSET. This is why you can’t use a column alias defined in SELECT inside WHERE — SELECT runs after WHERE.

When would you use a CTE over a subquery?

Use a CTE when you need to reference the same derived data multiple times, when the query is deeply nested and hard to read, or when you need recursion. For simple, one-off subqueries, an inline subquery is fine and sometimes performs better.

What is the difference between COUNT(*) and COUNT(column)?

COUNT(*) counts all rows including those with NULLs. COUNT(column) counts only rows where column is not NULL. For example, if 10 rows exist but 3 have NULL in the email column, COUNT(*) returns 10 and COUNT(email) returns 7.

​Sample data

​Basic SELECT

​Column aliases (AS)

​DISTINCT

​DISTINCT ON (PostgreSQL-specific)

​WHERE — filtering rows

​Comparison operators

​AND, OR, NOT

​BETWEEN

​IN

​LIKE & ILIKE

​IS NULL / IS NOT NULL

​ORDER BY — sorting results

​LIMIT & OFFSET — pagination

​Aggregate functions

​GROUP BY

​Rules

​HAVING — filtering groups

​WHERE vs HAVING

​Subqueries

​Subquery in WHERE

​Subquery with IN

​Subquery in FROM (derived table)

​Correlated subquery

​EXISTS

​CTEs — Common Table Expressions

​Multiple CTEs

​Recursive CTE

​Common interview questions