String concatenation, pattern matching, formatting, and text manipulation in PostgreSQL.
PostgreSQL has a rich set of string operators and functions for text manipulation. These are essential for data cleaning, formatting, and search queries.
-- CONCAT — joins arguments, ignores NULLsSELECT CONCAT(first_name, ' ', last_name) AS full_nameFROM users;-- CONCAT_WS — concat with separator (first arg is the separator)SELECT CONCAT_WS(', ', city, 'USA') AS locationFROM users;-- "New York, USA"-- CONCAT_WS also skips NULLsSELECT CONCAT_WS(' | ', first_name, bio) FROM users;-- Alice's NULL bio is skipped: "Alice"
SELECT UPPER(first_name) AS upper_name, LOWER(email) AS normalized_email, INITCAP(first_name || ' ' || last_name) AS proper_nameFROM users;
upper_name | normalized_email | proper_name------------+---------------------------+-------------- JOHN | john.doe@gmail.com | John Doe JANE | jane.smith@yahoo.com | Jane Smith BOB | bob_wilson@outlook.com | Bob Wilson ALICE | alice.ob@company.io | Alice O'Brien TOM | tom.lee@hotmail.com | Tom Lee
💡 Interview Tip
Always normalize data with LOWER() or UPPER() before comparisons to avoid case-sensitivity bugs: WHERE LOWER(email) = LOWER('John@Example.com'). Better yet, use CITEXT (case-insensitive text type) or ILIKE for pattern matching.
SELECT email, SUBSTRING(email FROM 1 FOR POSITION('@' IN email) - 1) AS username, SPLIT_PART(email, '@', 1) AS username_v2, SPLIT_PART(email, '@', 2) AS domainFROM users;
SELECT city, TRIM(city) AS trimmed, -- remove leading/trailing spaces LTRIM(city) AS left_trimmed, -- remove leading spaces only RTRIM(city) AS right_trimmed, -- remove trailing spaces only TRIM(BOTH ' ' FROM city) AS explicit -- same as TRIM(city)FROM users;
city | trimmed | left_trimmed | right_trimmed----------------+---------------+-----------------+--------------- New York | New York | New York | New York san francisco | san francisco | san francisco | san francisco seattle | seattle | seattle | seattle
-- Trim specific characters (not just spaces)SELECT TRIM(LEADING '0' FROM '000123'); -- '123'SELECT TRIM(BOTH '.' FROM '...hello...'); -- 'hello'
REPLACE swaps substrings — it looks for multi-character matches. TRANSLATE swaps individual characters — each character in the second argument is mapped to the corresponding character in the third argument. Use REPLACE for word substitution, TRANSLATE for character-level transformations.
-- LIKE is case-sensitiveSELECT * FROM users WHERE email LIKE '%gmail%';-- ILIKE is case-insensitive (PostgreSQL-specific)SELECT * FROM users WHERE email ILIKE '%gmail%';-- Names starting with 'j' (case-insensitive)SELECT * FROM users WHERE first_name ILIKE 'j%';
-- Case-sensitive regex matchSELECT * FROM users WHERE email ~ '^[a-z]+\.[a-z]+@';-- Case-insensitive regex matchSELECT * FROM users WHERE email ~* '^[a-z]+\.[a-z]+@';-- NOT matchSELECT * FROM users WHERE email !~ 'hotmail';
-- Split bio into words (returns rows)SELECT REGEXP_SPLIT_TO_TABLE('Full-stack developer with 5 years', '\s+');-- Full-stack-- developer-- with-- 5-- years-- Split into arraySELECT REGEXP_SPLIT_TO_ARRAY('Full-stack developer', '\s+');-- {Full-stack,developer}
-- ASCII value of a characterSELECT ASCII('A'); -- 65-- Character from ASCII codeSELECT CHR(65); -- 'A'-- Convert to hexSELECT TO_HEX(255); -- 'ff'-- MD5 hashSELECT MD5('password');-- '5f4dcc3b5aa765d61d8327deb882cf99'-- Format numbersSELECT TO_CHAR(1234567.89, '9,999,999.99');-- ' 1,234,567.89'-- Quote string for use in SQLSELECT QUOTE_LITERAL('O''Brien');-- '''O''Brien'''
SELECT LOWER(email), COUNT(*)FROM usersGROUP BY LOWER(email)HAVING COUNT(*) > 1;
Normalizing with LOWER() catches duplicates like John@Email.com and john@email.com.
How do you extract a domain from an email address?
Multiple approaches:
-- Using SPLIT_PARTSELECT SPLIT_PART(email, '@', 2) AS domain FROM users;-- Using SUBSTRING + POSITIONSELECT SUBSTRING(email FROM POSITION('@' IN email) + 1) AS domain FROM users;-- Using REGEXP_MATCHSELECT (REGEXP_MATCH(email, '@(.+)$'))[1] AS domain FROM users;
SPLIT_PART is the simplest and most readable.
What is the difference between LIKE and ~ (regex)?
LIKE uses simple wildcards (% for any chars, _ for one char) and is easier to read. ~ uses full POSIX regular expressions (character classes, quantifiers, groups) and is more powerful. Use LIKE for simple patterns and ~ when you need complex matching logic. ILIKE and ~* are their case-insensitive variants.
How do you reverse a string and check for palindromes?