You're staring at a dataset with customer names like "JOHN SMITH" and "mary johnson," order dates stored as "2024-01-15 14:30:22," and product codes like "ABC-123-DEF." Your manager needs a report with properly formatted names, dates grouped by month, and clean product identifiers. Without string and date functions, you'd be stuck copying data to Excel or writing complex application code. But SQL gives you the power to transform this messy data right in your queries.
String functions and date functions are the workhorses of data cleaning and transformation in SQL. They let you slice, dice, format, and manipulate text and temporal data with surgical precision. Whether you're standardizing customer names, extracting year-over-year trends, or parsing complex identifiers, these functions turn raw data into analysis-ready information.
What you'll learn:
This lesson assumes you understand basic SQL SELECT statements and WHERE clauses. You should be comfortable writing simple queries that retrieve data from tables. We'll build on that foundation to show you how functions transform data within your queries.
String functions operate on text data (also called character or varchar data) to clean, format, and extract information. Think of them as specialized tools in a text processing workshop—each designed for specific transformations.
Let's start with a realistic scenario. Imagine you're working with a customer database where data entry has been inconsistent:
-- Sample customer data with common real-world problems
SELECT
customer_id,
first_name,
last_name,
email,
phone
FROM customers
WHERE customer_id IN (1, 2, 3, 4, 5);
This might return:
customer_id | first_name | last_name | email | phone
1 | SARAH | JOHNSON | sarah.johnson@email.com | 555-123-4567
2 | john | smith | JOHN.SMITH@EMAIL.COM | 5551234568
3 | Maria | Rodriguez | maria.r@company.com | 555 123 4569
4 | David | Chen | david.chen@test.com | NULL
5 | JENNIFER | lee | jennifer@domain.co.uk | 555-123-4570
Notice the problems: inconsistent capitalization, extra spaces, varying phone number formats, and mixed-case emails. String functions solve these issues systematically.
The most fundamental string functions control text capitalization. These functions take a string input and return the same text with modified case.
-- Standardize names to proper case
SELECT
customer_id,
UPPER(first_name) as first_name_upper,
LOWER(first_name) as first_name_lower,
first_name as original_first_name
FROM customers
WHERE customer_id IN (1, 2, 3);
Results:
customer_id | first_name_upper | first_name_lower | original_first_name
1 | SARAH | sarah | SARAH
2 | JOHN | john | john
3 | MARIA | maria | Maria
For proper business formatting, you often want title case (first letter capitalized, rest lowercase). Some database systems provide INITCAP or PROPER functions:
-- PostgreSQL and Oracle have INITCAP
SELECT
customer_id,
INITCAP(LOWER(first_name)) as proper_first_name,
INITCAP(LOWER(last_name)) as proper_last_name
FROM customers;
Database Differences: MySQL doesn't have INITCAP built-in, but you can achieve similar results by combining UPPER, LOWER, and SUBSTRING functions. SQL Server uses a different approach with string manipulation functions.
Whitespace—spaces, tabs, and line breaks—often sneaks into data during import or manual entry. The TRIM functions remove unwanted whitespace:
-- Remove extra spaces from names
SELECT
customer_id,
CONCAT('[', first_name, ']') as before_trim,
CONCAT('[', TRIM(first_name), ']') as after_trim,
LENGTH(first_name) as length_before,
LENGTH(TRIM(first_name)) as length_after
FROM customers
WHERE customer_id = 3;
Results:
customer_id | before_trim | after_trim | length_before | length_after
3 | [ Maria] | [Maria] | 8 | 5
TRIM has variations for specific needs:
LTRIM() removes spaces from the left (beginning)RTRIM() removes spaces from the right (end) TRIM() removes spaces from both sides-- Clean up all customer names
UPDATE customers
SET first_name = TRIM(first_name),
last_name = TRIM(last_name);
SUBSTRING extracts a portion of text from a larger string. This is crucial when you need to parse codes, extract initials, or grab specific parts of formatted data.
The basic syntax is: SUBSTRING(string, start_position, length)
-- Extract area codes from phone numbers
SELECT
customer_id,
phone,
SUBSTRING(phone, 1, 3) as area_code,
SUBSTRING(phone, 5, 3) as exchange,
SUBSTRING(phone, 9, 4) as number
FROM customers
WHERE phone LIKE '___-___-____';
For customer 1 with phone "555-123-4567":
customer_id | phone | area_code | exchange | number
1 | 555-123-4567 | 555 | 123 | 4567
LEFT and RIGHT functions provide shortcuts for extracting from the beginning or end:
-- Get customer initials
SELECT
customer_id,
LEFT(TRIM(first_name), 1) + LEFT(TRIM(last_name), 1) as initials,
RIGHT(email, 3) as domain_extension
FROM customers;
CONCAT joins multiple strings together. This is essential for creating full names, formatted addresses, or custom identifiers.
-- Create properly formatted full names
SELECT
customer_id,
CONCAT(
INITCAP(TRIM(first_name)),
' ',
INITCAP(TRIM(last_name))
) as full_name,
CONCAT(
UPPER(LEFT(TRIM(first_name), 1)),
UPPER(LEFT(TRIM(last_name), 1))
) as initials
FROM customers;
Results:
customer_id | full_name | initials
1 | Sarah Johnson | SJ
2 | John Smith | JS
3 | Maria Rodriguez| MR
Different databases handle concatenation differently:
CONCAT(string1, string2, ...)CONCAT() or the + operatorCONCAT() or the || operator-- Alternative concatenation methods
SELECT
-- Standard CONCAT (most compatible)
CONCAT(first_name, ' ', last_name) as method1,
-- SQL Server
first_name + ' ' + last_name as method2,
-- PostgreSQL/Oracle
first_name || ' ' || last_name as method3
FROM customers;
These functions help you locate text within strings and perform substitutions—critical for data cleaning and standardization.
-- Find the @ symbol position in emails to validate format
SELECT
customer_id,
email,
CHARINDEX('@', email) as at_position,
CASE
WHEN CHARINDEX('@', email) > 0 THEN 'Valid format'
ELSE 'Missing @'
END as email_status
FROM customers;
REPLACE substitutes all occurrences of one substring with another:
-- Standardize phone number formats
SELECT
customer_id,
phone as original_phone,
REPLACE(REPLACE(REPLACE(phone, '-', ''), ' ', ''), '(', '') as digits_only,
CONCAT(
SUBSTRING(REPLACE(REPLACE(phone, '-', ''), ' ', ''), 1, 3),
'-',
SUBSTRING(REPLACE(REPLACE(phone, '-', ''), ' ', ''), 4, 3),
'-',
SUBSTRING(REPLACE(REPLACE(phone, '-', ''), ' ', ''), 7, 4)
) as standardized_phone
FROM customers
WHERE phone IS NOT NULL;
This transforms various phone formats into a standard "###-###-####" format.
Date and time data requires special handling because it represents moments in time rather than simple text or numbers. Date functions let you extract components (year, month, day), perform calculations (adding days, finding differences), and format dates for display.
Let's work with a realistic orders table:
-- Sample order data
SELECT
order_id,
customer_id,
order_date,
ship_date,
total_amount
FROM orders
WHERE order_id IN (1001, 1002, 1003, 1004, 1005);
Sample results:
order_id | customer_id | order_date | ship_date | total_amount
1001 | 1 | 2024-01-15 14:30:22 | 2024-01-17 09:15:00 | 150.75
1002 | 2 | 2024-02-03 10:45:15 | 2024-02-05 16:20:30 | 89.50
1003 | 3 | 2024-02-15 16:22:08 | NULL | 245.00
1004 | 1 | 2024-03-01 11:30:45 | 2024-03-03 14:25:12 | 67.25
1005 | 4 | 2024-03-10 09:15:30 | 2024-03-12 10:30:45 | 178.90
These functions extract specific parts of a date, which is essential for grouping, filtering, and analysis:
-- Extract date components for analysis
SELECT
order_id,
order_date,
YEAR(order_date) as order_year,
MONTH(order_date) as order_month,
DAY(order_date) as order_day,
DATENAME(MONTH, order_date) as month_name,
DATENAME(WEEKDAY, order_date) as day_of_week
FROM orders
WHERE order_id IN (1001, 1002, 1003);
Results:
order_id | order_date | order_year | order_month | order_day | month_name | day_of_week
1001 | 2024-01-15 14:30:22 | 2024 | 1 | 15 | January | Monday
1002 | 2024-02-03 10:45:15 | 2024 | 2 | 3 | February | Saturday
1003 | 2024-02-15 16:22:08 | 2024 | 2 | 15 | February | Thursday
This data structure enables powerful grouping for business analysis:
-- Monthly sales summary
SELECT
YEAR(order_date) as year,
MONTH(order_date) as month,
DATENAME(MONTH, order_date) as month_name,
COUNT(*) as total_orders,
SUM(total_amount) as monthly_revenue,
AVG(total_amount) as average_order_value
FROM orders
WHERE YEAR(order_date) = 2024
GROUP BY YEAR(order_date), MONTH(order_date), DATENAME(MONTH, order_date)
ORDER BY year, month;
Every database system provides functions to get the current date and time. These are crucial for calculating ages, time differences, and creating time-sensitive reports:
-- Different ways to get current date/time (syntax varies by database)
SELECT
GETDATE() as current_datetime, -- SQL Server
CURRENT_DATE as current_date_only, -- Standard SQL
CURRENT_TIME as current_time_only, -- Standard SQL
NOW() as current_timestamp; -- MySQL, PostgreSQL
Use current date functions to calculate business metrics:
-- Find orders placed in the last 30 days
SELECT
order_id,
customer_id,
order_date,
DATEDIFF(DAY, order_date, GETDATE()) as days_since_order
FROM orders
WHERE order_date >= DATEADD(DAY, -30, GETDATE())
ORDER BY order_date DESC;
Date arithmetic functions perform calculations with dates—finding differences between dates or adding/subtracting time periods.
DATEDIFF calculates the difference between two dates:
-- Calculate shipping time for completed orders
SELECT
order_id,
order_date,
ship_date,
DATEDIFF(DAY, order_date, ship_date) as shipping_days,
CASE
WHEN DATEDIFF(DAY, order_date, ship_date) <= 1 THEN 'Next Day'
WHEN DATEDIFF(DAY, order_date, ship_date) <= 3 THEN 'Standard'
ELSE 'Delayed'
END as shipping_category
FROM orders
WHERE ship_date IS NOT NULL;
Results:
order_id | order_date | ship_date | shipping_days | shipping_category
1001 | 2024-01-15 14:30:22 | 2024-01-17 09:15:00 | 2 | Standard
1002 | 2024-02-03 10:45:15 | 2024-02-05 16:20:30 | 2 | Standard
1004 | 2024-03-01 11:30:45 | 2024-03-03 14:25:12 | 2 | Standard
1005 | 2024-03-10 09:15:30 | 2024-03-12 10:30:45 | 2 | Standard
DATEADD adds or subtracts time from a date:
-- Find orders with expected ship dates (business rule: ship within 3 days)
SELECT
order_id,
order_date,
ship_date,
DATEADD(DAY, 3, order_date) as expected_ship_date,
CASE
WHEN ship_date IS NULL THEN 'Not Shipped'
WHEN ship_date <= DATEADD(DAY, 3, order_date) THEN 'On Time'
ELSE 'Late'
END as shipping_status
FROM orders;
Often you need to display dates in specific formats for reports or user interfaces. Different databases provide various formatting functions:
-- Format dates for different purposes (SQL Server example)
SELECT
order_id,
order_date,
FORMAT(order_date, 'yyyy-MM-dd') as iso_format,
FORMAT(order_date, 'MMMM dd, yyyy') as long_format,
FORMAT(order_date, 'MMM yyyy') as month_year,
FORMAT(order_date, 'yyyy-Q') as quarter_format
FROM orders
WHERE order_id IN (1001, 1002, 1003);
Results:
order_id | order_date | iso_format | long_format | month_year | quarter_format
1001 | 2024-01-15 14:30:22 | 2024-01-15 | January 15, 2024 | Jan 2024 | 2024-1
1002 | 2024-02-03 10:45:15 | 2024-02-03 | February 03, 2024| Feb 2024 | 2024-1
1003 | 2024-02-15 16:22:08 | 2024-02-15 | February 15, 2024| Feb 2024 | 2024-1
Different databases use different formatting functions:
FORMAT() functionTO_CHAR() functionDATE_FORMAT() functionTO_CHAR() functionThe real power emerges when you combine string and date functions to solve complex business problems. Let's build a customer summary report that demonstrates this integration:
-- Comprehensive customer summary with cleaned data
SELECT
c.customer_id,
-- Clean and format customer name
CONCAT(
INITCAP(TRIM(c.first_name)),
' ',
INITCAP(TRIM(c.last_name))
) as full_name,
-- Create customer code from initials and ID
CONCAT(
UPPER(LEFT(TRIM(c.first_name), 1)),
UPPER(LEFT(TRIM(c.last_name), 1)),
'-',
RIGHT('0000' + CAST(c.customer_id as VARCHAR), 4)
) as customer_code,
-- Clean email to lowercase
LOWER(TRIM(c.email)) as clean_email,
-- Extract domain from email
RIGHT(c.email, LEN(c.email) - CHARINDEX('@', c.email)) as email_domain,
-- Order statistics with date analysis
COUNT(o.order_id) as total_orders,
MIN(o.order_date) as first_order_date,
MAX(o.order_date) as last_order_date,
-- Calculate customer tenure
DATEDIFF(DAY, MIN(o.order_date), GETDATE()) as days_as_customer,
-- Format last order date for display
FORMAT(MAX(o.order_date), 'MMMM dd, yyyy') as last_order_formatted,
-- Calculate months since last order
DATEDIFF(MONTH, MAX(o.order_date), GETDATE()) as months_since_last_order,
-- Customer status based on recent activity
CASE
WHEN MAX(o.order_date) >= DATEADD(MONTH, -3, GETDATE()) THEN 'Active'
WHEN MAX(o.order_date) >= DATEADD(MONTH, -12, GETDATE()) THEN 'At Risk'
ELSE 'Inactive'
END as customer_status
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name, c.email
ORDER BY total_orders DESC;
This query demonstrates professional-level data transformation:
Now let's apply these concepts to a realistic scenario. You're analyzing an e-commerce database with messy product data and need to create a clean inventory report.
Exercise Data Setup:
-- Assume you have this product table
CREATE TABLE products (
product_id INT,
product_name VARCHAR(100),
category VARCHAR(50),
sku VARCHAR(20),
created_date DATETIME,
last_updated DATETIME,
price DECIMAL(10,2)
);
INSERT INTO products VALUES
(1, ' WIRELESS bluetooth HEADPHONES ', 'electronics', 'WBH-001-BLK', '2023-06-15 10:30:00', '2024-02-20 14:45:00', 79.99),
(2, 'organic cotton T-SHIRT', 'CLOTHING', 'OCT-002-WHT', '2023-08-22 09:15:00', '2024-01-10 11:20:00', 24.95),
(3, 'STAINLESS steel water bottle', 'Kitchen & dining', 'SSW-003-BLU', '2023-05-10 16:45:00', '2024-03-05 09:30:00', 18.50);
Your Task: Write a query that creates a clean product catalog with the following requirements:
Try writing the query yourself before looking at the solution.
Solution:
SELECT
product_id,
-- Clean product name to title case
REPLACE(
REPLACE(
REPLACE(INITCAP(TRIM(product_name)), ' And ', ' and '),
' Of ', ' of '
),
' The ', ' the '
) as clean_product_name,
-- Clean category
INITCAP(TRIM(category)) as clean_category,
-- Extract color from SKU (after last dash)
RIGHT(sku, 3) as color_code,
-- Calculate catalog age
DATEDIFF(DAY, created_date, GETDATE()) as days_in_catalog,
-- Determine freshness
CASE
WHEN DATEDIFF(DAY, last_updated, GETDATE()) <= 60 THEN 'Fresh'
ELSE 'Stale'
END as data_freshness,
-- Create product code from first letters
CONCAT(
UPPER(LEFT(TRIM(SUBSTRING(product_name, 1, CHARINDEX(' ', TRIM(product_name)) - 1)), 1)),
UPPER(LEFT(TRIM(SUBSTRING(product_name, CHARINDEX(' ', TRIM(product_name)) + 1, LEN(product_name))), 1))
) as product_code,
-- Format dates for display
FORMAT(created_date, 'MMM dd, yyyy') as created_display,
FORMAT(last_updated, 'MMM dd, yyyy') as updated_display,
-- Price with currency
CONCAT('$', FORMAT(price, 'N2')) as formatted_price
FROM products
ORDER BY days_in_catalog DESC;
Mistake 1: Ignoring NULL values in string functions String functions often return unexpected results with NULL values:
-- Problem: CONCAT with NULL returns NULL
SELECT CONCAT(first_name, ' ', NULL, ' ', last_name) FROM customers;
-- Result: NULL (entire result is NULL)
-- Solution: Use COALESCE or ISNULL
SELECT CONCAT(first_name, ' ', COALESCE(middle_name, ''), ' ', last_name) FROM customers;
Mistake 2: Assuming consistent date formats Always validate your date operations:
-- Problem: Assuming all dates are valid
SELECT DATEDIFF(DAY, order_date, ship_date) FROM orders;
-- Fails when ship_date is NULL
-- Solution: Handle NULLs explicitly
SELECT
CASE
WHEN ship_date IS NOT NULL
THEN DATEDIFF(DAY, order_date, ship_date)
ELSE NULL
END as shipping_days
FROM orders;
Mistake 3: Database-specific function syntax Different databases use different function names:
-- SQL Server
SELECT LEN(product_name) FROM products;
-- PostgreSQL/MySQL
SELECT LENGTH(product_name) FROM products;
-- Always check your database documentation
Debugging Tip: Use the CONCAT function with brackets to visualize hidden characters:
CONCAT('[', field_name, ']')reveals leading/trailing spaces and unusual characters.
Mistake 4: Timezone confusion with date functions Be aware that date functions may use server timezone:
-- Current date might not match your local timezone
SELECT GETDATE() as server_time, GETUTCDATE() as utc_time;
-- Always document which timezone your dates represent
You've now mastered the essential string and date functions that form the backbone of SQL data transformation. These functions enable you to:
The real power lies in combining these functions strategically. You can transform messy, inconsistent data into clean, analysis-ready information directly within your SQL queries—eliminating the need for external data processing tools.
Immediate next steps:
What's coming next: In your continued SQL journey, you'll learn about window functions for advanced analytics, subqueries for complex data relationships, and performance optimization techniques. These string and date manipulation skills will be essential building blocks for those advanced topics.
The functions you've learned today will appear in virtually every real-world SQL query you write. Master them, and you'll transform from someone who struggles with messy data into a professional who can clean, analyze, and present data with confidence.
Learning Path: SQL Fundamentals