String Operators & Functions - Interview Prep Notes

PostgreSQL has a rich set of string operators and functions for text manipulation. These are essential for data cleaning, formatting, and search queries.

Sample data

CREATE TABLE users (
  id         SERIAL PRIMARY KEY,
  first_name VARCHAR(50),
  last_name  VARCHAR(50),
  email      VARCHAR(100),
  city       VARCHAR(50),
  bio        TEXT
);

INSERT INTO users (first_name, last_name, email, city, bio)
VALUES
  ('John',   'Doe',     'JOHN.DOE@gmail.com',     '  New York  ', 'Full-stack developer with 5 years exp.'),
  ('jane',   'SMITH',   'jane.smith@Yahoo.COM',    'san francisco', 'Data scientist and ML engineer.'),
  ('Bob',    'Wilson',  'bob_wilson@outlook.com',  'Chicago',      'DevOps engineer. AWS certified.'),
  ('Alice',  'O''Brien', 'alice.ob@company.io',    'Los Angeles',  NULL),
  ('tom',    'lee',     'TOM.LEE@HOTMAIL.COM',     ' seattle ',    'Product manager at startup.');

String operators

Concatenation (`||`)

The || operator joins strings together.

SELECT first_name || ' ' || last_name AS full_name
FROM users;

   full_name
--------------
 John Doe
 jane SMITH
 Bob Wilson
 Alice O'Brien
 tom lee

◆ NULL in concatenation

If any operand is NULL, the || operator returns NULL. Use COALESCE to handle this: first_name || ' ' || COALESCE(last_name, '').

CONCAT and CONCAT_WS

Unlike ||, CONCAT treats NULL as an empty string.

-- CONCAT — joins arguments, ignores NULLs
SELECT CONCAT(first_name, ' ', last_name) AS full_name
FROM users;

-- CONCAT_WS — concat with separator (first arg is the separator)
SELECT CONCAT_WS(', ', city, 'USA') AS location
FROM users;
-- "New York, USA"

-- CONCAT_WS also skips NULLs
SELECT CONCAT_WS(' | ', first_name, bio) FROM users;
-- Alice's NULL bio is skipped: "Alice"

Case conversion

Function	Description	Example	Result
`UPPER(str)`	Convert to uppercase	`UPPER('hello')`	`HELLO`
`LOWER(str)`	Convert to lowercase	`LOWER('HELLO')`	`hello`
`INITCAP(str)`	Capitalize first letter of each word	`INITCAP('hello world')`	`Hello World`

SELECT
  UPPER(first_name) AS upper_name,
  LOWER(email) AS normalized_email,
  INITCAP(first_name || ' ' || last_name) AS proper_name
FROM users;

 upper_name | normalized_email          | proper_name
------------+---------------------------+--------------
 JOHN       | john.doe@gmail.com        | John Doe
 JANE       | jane.smith@yahoo.com      | Jane Smith
 BOB        | bob_wilson@outlook.com    | Bob Wilson
 ALICE      | alice.ob@company.io       | Alice O'Brien
 TOM        | tom.lee@hotmail.com       | Tom Lee

💡 Interview Tip

Always normalize data with LOWER() or UPPER() before comparisons to avoid case-sensitivity bugs: WHERE LOWER(email) = LOWER('John@Example.com'). Better yet, use CITEXT (case-insensitive text type) or ILIKE for pattern matching.

Length and position

Function	Description	Example	Result
`LENGTH(str)`	Number of characters	`LENGTH('Hello')`	`5`
`CHAR_LENGTH(str)`	Same as LENGTH	`CHAR_LENGTH('Hello')`	`5`
`OCTET_LENGTH(str)`	Number of bytes	`OCTET_LENGTH('cafe')`	`4`
`POSITION(sub IN str)`	Position of first occurrence (1-based)	`POSITION('lo' IN 'Hello')`	`4`
`STRPOS(str, sub)`	Same as POSITION (PG syntax)	`STRPOS('Hello', 'lo')`	`4`

SELECT
  email,
  LENGTH(email) AS email_length,
  POSITION('@' IN email) AS at_position
FROM users;

         email           | email_length | at_position
-------------------------+--------------+------------
 JOHN.DOE@gmail.com      |           18 |           9
 jane.smith@Yahoo.COM    |           20 |          11
 bob_wilson@outlook.com  |           22 |          11
 alice.ob@company.io     |           19 |           9
 TOM.LEE@HOTMAIL.COM     |           19 |           8

Substring extraction

Function	Description
`SUBSTRING(str FROM start FOR length)`	Extract substring (1-based index)
`SUBSTR(str, start, length)`	Shorthand (same result)
`LEFT(str, n)`	First `n` characters
`RIGHT(str, n)`	Last `n` characters
`SPLIT_PART(str, delimiter, field)`	Split string and get nth part

SELECT
  email,
  SUBSTRING(email FROM 1 FOR POSITION('@' IN email) - 1) AS username,
  SPLIT_PART(email, '@', 1) AS username_v2,
  SPLIT_PART(email, '@', 2) AS domain
FROM users;

         email           | username   | username_v2 | domain
-------------------------+------------+-------------+-----------
 JOHN.DOE@gmail.com      | JOHN.DOE   | JOHN.DOE    | gmail.com
 jane.smith@Yahoo.COM    | jane.smith | jane.smith  | Yahoo.COM
 bob_wilson@outlook.com  | bob_wilson | bob_wilson  | outlook.com

SELECT
  LEFT('PostgreSQL', 8) AS left_result,    -- 'PostgreS'
  RIGHT('PostgreSQL', 3) AS right_result;  -- 'SQL'

Substring with regex

-- Extract first number from a string
SELECT SUBSTRING('Order #12345 confirmed' FROM '#([0-9]+)');
-- Result: '12345'

Trimming and padding

TRIM, LTRIM, RTRIM

SELECT
  city,
  TRIM(city) AS trimmed,               -- remove leading/trailing spaces
  LTRIM(city) AS left_trimmed,          -- remove leading spaces only
  RTRIM(city) AS right_trimmed,         -- remove trailing spaces only
  TRIM(BOTH ' ' FROM city) AS explicit  -- same as TRIM(city)
FROM users;

      city      | trimmed       | left_trimmed    | right_trimmed
----------------+---------------+-----------------+---------------
   New York     | New York      | New York        |    New York
 san francisco  | san francisco | san francisco   |  san francisco
  seattle       | seattle       | seattle         |  seattle

-- Trim specific characters (not just spaces)
SELECT TRIM(LEADING '0' FROM '000123');   -- '123'
SELECT TRIM(BOTH '.' FROM '...hello...');  -- 'hello'

LPAD, RPAD

Pad a string to a given length.

SELECT
  LPAD('42', 5, '0') AS left_padded,    -- '00042'
  RPAD('hi', 10, '.') AS right_padded;  -- 'hi........'

-- Useful for formatting order numbers
SELECT LPAD(id::TEXT, 6, '0') AS order_number
FROM orders;
-- '000001', '000042', '001234'

Replace and translate

REPLACE

Replace all occurrences of a substring.

SELECT REPLACE('Hello World', 'World', 'PostgreSQL');
-- 'Hello PostgreSQL'

-- Remove all spaces
SELECT REPLACE('H e l l o', ' ', '');
-- 'Hello'

-- Normalize data
SELECT REPLACE(LOWER(email), ' ', '') AS clean_email
FROM users;

TRANSLATE

Replace characters one-to-one (character mapping).

-- Replace a→@, e→3, o→0
SELECT TRANSLATE('hello world', 'aeo', '@30');
-- 'h3ll0 w0rld'

-- Remove all vowels
SELECT TRANSLATE('PostgreSQL', 'aeiouAEIOU', '');
-- 'PstgrSQL'

-- Clean phone numbers (remove dashes, parentheses, spaces)
SELECT TRANSLATE('(555) 123-4567', '()- ', '');
-- '5551234567'

◆ REPLACE vs TRANSLATE

REPLACE swaps substrings — it looks for multi-character matches. TRANSLATE swaps individual characters — each character in the second argument is mapped to the corresponding character in the third argument. Use REPLACE for word substitution, TRANSLATE for character-level transformations.

Pattern matching

LIKE / ILIKE

-- LIKE is case-sensitive
SELECT * FROM users WHERE email LIKE '%gmail%';

-- ILIKE is case-insensitive (PostgreSQL-specific)
SELECT * FROM users WHERE email ILIKE '%gmail%';

-- Names starting with 'j' (case-insensitive)
SELECT * FROM users WHERE first_name ILIKE 'j%';

Pattern	Matches
`'J%'`	Starts with J
`'%son'`	Ends with son
`'%an%'`	Contains an
`'_o%'`	Second character is o
`'___'`	Exactly 3 characters

SIMILAR TO (SQL-standard regex)

-- Email from gmail or yahoo
SELECT * FROM users
WHERE email SIMILAR TO '%(gmail|yahoo)%';

~ (POSIX regex)

-- Case-sensitive regex match
SELECT * FROM users WHERE email ~ '^[a-z]+\.[a-z]+@';

-- Case-insensitive regex match
SELECT * FROM users WHERE email ~* '^[a-z]+\.[a-z]+@';

-- NOT match
SELECT * FROM users WHERE email !~ 'hotmail';

Operator	Description
`~`	Case-sensitive regex match
`~*`	Case-insensitive regex match
`!~`	Case-sensitive NOT match
`!~*`	Case-insensitive NOT match

Regex functions

REGEXP_MATCH / REGEXP_MATCHES

-- Extract domain from email (returns array)
SELECT
  email,
  (REGEXP_MATCH(email, '@(.+)$'))[1] AS domain
FROM users;

         email           | domain
-------------------------+-----------
 JOHN.DOE@gmail.com      | gmail.com
 jane.smith@Yahoo.COM    | Yahoo.COM
 bob_wilson@outlook.com  | outlook.com

REGEXP_REPLACE

-- Remove all non-alphanumeric characters
SELECT REGEXP_REPLACE('Hello, World! 123', '[^a-zA-Z0-9]', '', 'g');
-- 'HelloWorld123'

-- Mask email addresses
SELECT REGEXP_REPLACE(email, '(.).*@', '\1***@') FROM users;
-- 'J***@gmail.com', 'j***@Yahoo.COM'

REGEXP_SPLIT_TO_TABLE / REGEXP_SPLIT_TO_ARRAY

-- Split bio into words (returns rows)
SELECT REGEXP_SPLIT_TO_TABLE('Full-stack developer with 5 years', '\s+');
-- Full-stack
-- developer
-- with
-- 5
-- years

-- Split into array
SELECT REGEXP_SPLIT_TO_ARRAY('Full-stack developer', '\s+');
-- {Full-stack,developer}

Repeat and reverse

SELECT REPEAT('ab', 3);   -- 'ababab'
SELECT REPEAT('-', 20);   -- '--------------------'

SELECT REVERSE('Hello');   -- 'olleH'

String aggregation

STRING_AGG

Concatenate values from multiple rows into one string.

-- List all employees per department
SELECT
  department,
  STRING_AGG(first_name, ', ' ORDER BY first_name) AS employees
FROM employees
GROUP BY department;

 department  | employees
-------------+-------------------------------
 Design      | Alice, Mike
 Engineering | Jane, John, Sara, Tom
 Marketing   | Bob, Eve

-- With DISTINCT to remove duplicates
SELECT STRING_AGG(DISTINCT category, ' | ' ORDER BY category) FROM products;
-- 'Electronics | Furniture'

Encoding and formatting

-- ASCII value of a character
SELECT ASCII('A');   -- 65

-- Character from ASCII code
SELECT CHR(65);      -- 'A'

-- Convert to hex
SELECT TO_HEX(255); -- 'ff'

-- MD5 hash
SELECT MD5('password');
-- '5f4dcc3b5aa765d61d8327deb882cf99'

-- Format numbers
SELECT TO_CHAR(1234567.89, '9,999,999.99');
-- ' 1,234,567.89'

-- Quote string for use in SQL
SELECT QUOTE_LITERAL('O''Brien');
-- '''O''Brien'''

Quick reference table

Function	Purpose	Example	Result
`\|\|`	Concatenate	`'Hi' \|\| ' ' \|\| 'there'`	`Hi there`
`CONCAT(a, b)`	Concat (NULL-safe)	`CONCAT('Hi', NULL, '!')`	`Hi!`
`CONCAT_WS(sep, ...)`	Concat with separator	`CONCAT_WS('-', 'a', 'b')`	`a-b`
`UPPER(s)`	Uppercase	`UPPER('hello')`	`HELLO`
`LOWER(s)`	Lowercase	`LOWER('HELLO')`	`hello`
`INITCAP(s)`	Title case	`INITCAP('hello world')`	`Hello World`
`LENGTH(s)`	String length	`LENGTH('hello')`	`5`
`TRIM(s)`	Remove whitespace	`TRIM(' hi ')`	`hi`
`LPAD(s, n, c)`	Left pad	`LPAD('5', 3, '0')`	`005`
`REPLACE(s, from, to)`	Replace substring	`REPLACE('abc', 'b', 'x')`	`axc`
`SPLIT_PART(s, d, n)`	Split and get part	`SPLIT_PART('a-b-c', '-', 2)`	`b`
`SUBSTRING(s FROM n FOR l)`	Extract part	`SUBSTRING('hello' FROM 2 FOR 3)`	`ell`
`LEFT(s, n)`	First n chars	`LEFT('hello', 3)`	`hel`
`RIGHT(s, n)`	Last n chars	`RIGHT('hello', 3)`	`llo`
`STRING_AGG(s, d)`	Aggregate concat	—	—

Common interview questions

How do you find duplicate email addresses?

SELECT LOWER(email), COUNT(*)
FROM users
GROUP BY LOWER(email)
HAVING COUNT(*) > 1;

Normalizing with LOWER() catches duplicates like John@Email.com and john@email.com.

How do you extract a domain from an email address?

Multiple approaches:

-- Using SPLIT_PART
SELECT SPLIT_PART(email, '@', 2) AS domain FROM users;

-- Using SUBSTRING + POSITION
SELECT SUBSTRING(email FROM POSITION('@' IN email) + 1) AS domain FROM users;

-- Using REGEXP_MATCH
SELECT (REGEXP_MATCH(email, '@(.+)$'))[1] AS domain FROM users;

SPLIT_PART is the simplest and most readable.

What is the difference between LIKE and ~ (regex)?

LIKE uses simple wildcards (% for any chars, _ for one char) and is easier to read. ~ uses full POSIX regular expressions (character classes, quantifiers, groups) and is more powerful. Use LIKE for simple patterns and ~ when you need complex matching logic. ILIKE and ~* are their case-insensitive variants.

How do you reverse a string and check for palindromes?

SELECT word
FROM words
WHERE LOWER(word) = LOWER(REVERSE(word));

This finds palindromes like ‘madam’, ‘racecar’, and ‘level’.

​Sample data

​String operators

​Concatenation (||)

​CONCAT and CONCAT_WS

​Case conversion

​Length and position

​Substring extraction

​Substring with regex

​Trimming and padding

​TRIM, LTRIM, RTRIM

​LPAD, RPAD

​Replace and translate

​REPLACE

​TRANSLATE

​Pattern matching

​LIKE / ILIKE

​SIMILAR TO (SQL-standard regex)

​~ (POSIX regex)

​Regex functions

​REGEXP_MATCH / REGEXP_MATCHES

​REGEXP_REPLACE

​REGEXP_SPLIT_TO_TABLE / REGEXP_SPLIT_TO_ARRAY

​Repeat and reverse

​String aggregation

​STRING_AGG

​Encoding and formatting

​Quick reference table

​Common interview questions