SELECT statement is the most frequently used SQL command. It retrieves data from one or more tables and supports filtering, sorting, grouping, and nesting.
Sample data
All examples below use thisemployees table:
Basic SELECT
◆ Avoid SELECT * in production
SELECT * fetches every column, which wastes bandwidth and breaks if the schema changes. Always list the columns you need explicitly, especially in application code.Column aliases (AS)
AS keyword is optional — salary annual_salary works too — but using it makes queries more readable.
DISTINCT
Remove duplicate values from the result set.DISTINCT ON (PostgreSQL-specific)
Return one row per distinct value of the specified column(s).WHERE — filtering rows
Comparison operators
| Operator | Meaning |
|---|---|
= | Equal to |
!= or <> | Not equal to |
>, < | Greater / less than |
>=, <= | Greater or equal / less or equal |
BETWEEN a AND b | Inclusive range |
IN (...) | Matches any value in list |
LIKE / ILIKE | Pattern matching |
IS NULL / IS NOT NULL | Null checks |
AND, OR, NOT
BETWEEN
IN
LIKE & ILIKE
| Pattern | Meaning |
|---|---|
% | Any sequence of characters (including none) |
_ | Exactly one character |
IS NULL / IS NOT NULL
◆ NULL is not a value
NULL means unknown or missing. You cannot compare it with = — WHERE department = NULL always returns no rows. Always use IS NULL or IS NOT NULL.ORDER BY — sorting results
LIMIT & OFFSET — pagination
💡 Interview Tip
OFFSET-based pagination gets slower on large tables because the database still scans and discards the skipped rows. For production apps, prefer cursor-based pagination using WHERE id > last_seen_id ORDER BY id LIMIT n.Aggregate functions
| Function | Description |
|---|---|
COUNT(*) | Total number of rows |
COUNT(col) | Non-NULL values in column |
SUM(col) | Sum of values |
AVG(col) | Average of values |
MIN(col) | Minimum value |
MAX(col) | Maximum value |
GROUP BY
Group rows that share a value and apply aggregate functions to each group.Rules
- Every column in the
SELECTthat is not inside an aggregate function must appear inGROUP BY. - You cannot use column aliases defined in
SELECTinsideGROUP BY(PostgreSQL does allow this, but it’s non-standard).
HAVING — filtering groups
HAVING filters groups after aggregation, whereas WHERE filters rows before aggregation.
WHERE vs HAVING
◆ SQL Execution Order
SQL doesn’t run top to bottom. The logical execution order is:
FROM → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMITSubqueries
A query nested inside another query.Subquery in WHERE
Subquery with IN
Subquery in FROM (derived table)
Correlated subquery
References a column from the outer query — runs once per outer row.EXISTS
💡 Interview Tip
EXISTS is often faster than IN for correlated subqueries because it stops scanning as soon as it finds the first match (short-circuit evaluation). Use EXISTS when checking for existence and IN when comparing against a small, known set of values.CTEs — Common Table Expressions
A CTE (usingWITH) creates a named temporary result set that makes complex queries readable.
Multiple CTEs
Recursive CTE
Useful for hierarchical/tree data (org charts, categories, graphs).◆ CTE vs Subquery
CTEs improve readability by naming intermediate results. In PostgreSQL 12+, CTEs can be inlined by the optimizer (treated like subqueries) unless you add
MATERIALIZED to force them to compute once. Before PG12, CTEs were always materialized (acted as optimization barriers).Common interview questions
What is the difference between WHERE and HAVING?
What is the difference between WHERE and HAVING?
WHERE filters individual rows before any grouping happens. HAVING filters groups after GROUP BY and aggregation. You cannot use aggregate functions in WHERE, but you can in HAVING.What is the logical execution order of a SELECT statement?
What is the logical execution order of a SELECT statement?
FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT/OFFSET. This is why you can’t use a column alias defined in SELECT inside WHERE — SELECT runs after WHERE.When would you use a CTE over a subquery?
When would you use a CTE over a subquery?
Use a CTE when you need to reference the same derived data multiple times, when the query is deeply nested and hard to read, or when you need recursion. For simple, one-off subqueries, an inline subquery is fine and sometimes performs better.
What is the difference between COUNT(*) and COUNT(column)?
What is the difference between COUNT(*) and COUNT(column)?
COUNT(*) counts all rows including those with NULLs. COUNT(column) counts only rows where column is not NULL. For example, if 10 rows exist but 3 have NULL in the email column, COUNT(*) returns 10 and COUNT(email) returns 7.