Regular Expressions: A Practical Guide with Examples
· 12 min read
Regular expressions (regex) are one of the most powerful tools in a developer's toolkit, yet they're often misunderstood or avoided entirely. Whether you're validating user input, parsing log files, or transforming text data, regex provides a concise and efficient way to match patterns in strings.
This comprehensive guide will take you from regex basics to advanced techniques, with practical examples you can use immediately in your projects. By the end, you'll understand not just how regex works, but when and why to use it.
Table of Contents
- What Is Regex and Why Should You Care?
- Regex Basics: Building Blocks
- Quantifiers: Controlling Repetition
- Character Classes and Shortcuts
- Groups and Captures
- Anchors and Boundaries
- Common Patterns for Real-World Use
- Flags: Modifying Regex Behavior
- Advanced Techniques
- Performance Tips and Best Practices
- Testing and Debugging Regex
- Frequently Asked Questions
What Is Regex and Why Should You Care?
A regular expression is a sequence of characters that defines a search pattern. Think of it as a mini-language specifically designed for pattern matching in text. Instead of searching for exact strings, regex lets you describe patterns like "any email address" or "all phone numbers in this format."
Regex is supported across virtually every programming language and many command-line tools. Once you learn the syntax, you can apply it everywhere—from JavaScript and Python to grep and sed.
Common use cases include:
- Validating user input (emails, phone numbers, passwords)
- Extracting data from text (parsing logs, scraping content)
- Search and replace operations (refactoring code, cleaning data)
- URL routing and pattern matching
- Syntax highlighting and lexical analysis
Pro tip: While regex is powerful, it's not always the right tool. For complex parsing tasks like HTML or JSON, use dedicated parsers instead. Regex works best for well-defined, relatively simple patterns.
Regex Basics: Building Blocks
Every regex pattern is built from fundamental building blocks. Understanding these core elements is essential before moving to more complex patterns.
Literal Characters
The simplest regex is just literal text. The pattern cat matches the exact string "cat" in your text. Most alphanumeric characters match themselves directly.
Metacharacters
Certain characters have special meanings in regex. These are called metacharacters and include: . ^ $ * + ? { } [ ] \ | ( )
To match these characters literally, you need to escape them with a backslash. For example, \. matches a literal period.
The Dot Wildcard
The dot . is the most basic wildcard—it matches any single character except newline. The pattern a.c matches "abc", "a1c", "a-c", but not "ac" (no character between) or "a\nc" (newline).
| Pattern | Matches | Example |
|---|---|---|
. |
Any character (except newline) | a.c matches abc, a1c, a-c |
\d |
Any digit [0-9] | \d{3} matches 123, 456 |
\w |
Word character [a-zA-Z0-9_] | \w+ matches hello, user_1 |
\s |
Whitespace (space, tab, newline) | \s+ matches spaces, tabs |
\D |
Non-digit | \D+ matches abc, xyz |
\W |
Non-word character | \W+ matches !@#, spaces |
\S |
Non-whitespace | \S+ matches any visible text |
Notice that uppercase versions (\D, \W, \S) are the inverse of their lowercase counterparts. This is a common pattern in regex syntax.
Quantifiers: Controlling Repetition
Quantifiers specify how many times a pattern should repeat. They're placed after the element you want to repeat and are fundamental to creating flexible patterns.
Basic Quantifiers
| Quantifier | Meaning | Example |
|---|---|---|
* |
0 or more times | ab*c matches ac, abc, abbc, abbbc |
+ |
1 or more times | ab+c matches abc, abbc (not ac) |
? |
0 or 1 time (optional) | colou?r matches color, colour |
{n} |
Exactly n times | \d{4} matches 2026, 1999 |
{n,} |
n or more times | \d{3,} matches 123, 1234, 12345 |
{n,m} |
Between n and m times | \d{2,4} matches 12, 123, 1234 |
Greedy vs. Lazy Matching
By default, quantifiers are greedy—they match as much text as possible. This can lead to unexpected results when matching patterns like HTML tags.
// Greedy matching
const text = "<div>Hello</div><div>World</div>";
const greedy = /<.*>/;
// Matches: "<div>Hello</div><div>World</div>" (entire string!)
// Lazy matching
const lazy = /<.*?>/;
// Matches: "<div>" (stops at first closing bracket)
Adding ? after a quantifier makes it lazy (non-greedy). Lazy quantifiers match as little text as possible while still satisfying the pattern.
Quick tip: When matching content between delimiters (quotes, brackets, tags), lazy quantifiers are usually what you want. Use .*? instead of .* to avoid matching too much.
Character Classes and Shortcuts
Character classes let you match any character from a specific set. They're defined using square brackets and are incredibly useful for creating flexible patterns.
Basic Character Classes
// Match any vowel
/[aeiou]/
// Match any digit
/[0-9]/
// Match any lowercase letter
/[a-z]/
// Match any letter (case-insensitive)
/[a-zA-Z]/
// Match alphanumeric characters
/[a-zA-Z0-9]/
Negated Character Classes
Use ^ at the start of a character class to negate it—matching any character except those listed.
// Match any non-digit
/[^0-9]/
// Match any non-vowel
/[^aeiou]/
// Match any character except space or tab
/[^ \t]/
Special Characters in Classes
Most metacharacters lose their special meaning inside character classes. You can include ., *, +, and ? without escaping them. However, you still need to escape ], \, ^, and - in certain positions.
// Match a period or comma
/[.,]/
// Match a hyphen (escape or place at start/end)
/[-a-z]/ or /[a-z-]/
// Match a closing bracket (must escape)
/[\]]/
Groups and Captures
Groups serve two main purposes: they let you apply quantifiers to multiple characters, and they capture matched text for later use.
Capturing Groups
Parentheses create capturing groups that remember the matched text. This is essential for extracting data from strings.
// Extract date components
const datePattern = /(\d{4})-(\d{2})-(\d{2})/;
const match = "2026-03-29".match(datePattern);
// match[0]: "2026-03-29" (full match)
// match[1]: "2026" (first group)
// match[2]: "03" (second group)
// match[3]: "29" (third group)
Named Capturing Groups
Named groups make your regex more readable and your code more maintainable. Instead of referring to groups by number, you give them descriptive names.
// Named groups syntax: (?<name>pattern)
const datePattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = "2026-03-29".match(datePattern);
// Access by name
console.log(match.groups.year); // "2026"
console.log(match.groups.month); // "03"
console.log(match.groups.day); // "29"
Non-Capturing Groups
Sometimes you need grouping for quantifiers or alternation but don't want to capture the text. Non-capturing groups use (?:...) syntax.
// Capture the protocol without creating a group
/(?:https?|ftp):\/\/([a-z.]+)/
// This creates only one capture group (the domain)
// The protocol group (?:https?|ftp) doesn't capture
Non-capturing groups are faster and use less memory than capturing groups. Use them when you don't need the captured text.
Backreferences
Backreferences let you match the same text that was captured by a previous group. This is useful for finding repeated words or matching paired delimiters.
// Find repeated words
/\b(\w+)\s+\1\b/
// Matches "the the" or "hello hello"
// Match paired quotes
/(['"])(.*?)\1/
// Matches "hello" or 'world' but not "mixed'
Alternation
The pipe | operator creates alternation—matching one pattern or another. It's like a logical OR.
// Match cat, dog, or bird
/cat|dog|bird/
// Match common file extensions
/\.(jpg|jpeg|png|gif|webp)$/i
// Match Mr., Mrs., Ms., or Dr.
/(?:Mr|Mrs|Ms|Dr)\./
Anchors and Boundaries
Anchors don't match characters—they match positions in the string. They're essential for ensuring patterns appear in specific locations.
String Anchors
^matches the start of the string (or line in multiline mode)$matches the end of the string (or line in multiline mode)
// Must start with "Hello"
/^Hello/
// Must end with "world"
/world$/
// Entire string must be exactly 5 digits
/^\d{5}$/
Word Boundaries
The \b anchor matches word boundaries—positions between word and non-word characters. This is crucial for matching whole words.
// Match "cat" as a whole word
/\bcat\b/
// Matches: "cat", "the cat sat"
// Doesn't match: "category", "scat"
// Match words starting with "pre"
/\bpre\w+/
// Matches: "preview", "prepare", "prefix"
The inverse \B matches non-word boundaries—positions where both sides are word characters or both are non-word characters.
Pro tip: Always use word boundaries when searching for whole words. Without them, searching for "cat" will also match "category" and "concatenate". The pattern \bcat\b ensures you only match the complete word.
Common Patterns for Real-World Use
Here are battle-tested regex patterns for common validation and extraction tasks. These patterns balance simplicity with practical accuracy.
| Pattern Type | Regex | Notes |
|---|---|---|
| Email (simple) | ^[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}$ |
Good for basic validation |
| Email (RFC-compliant) | ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ |
More strict, widely accepted |
| URL | ^https?:\/\/[\w.-]+(?:\.[a-zA-Z]{2,})(?:\/\S*)?$ |
Basic HTTP/HTTPS URLs |
| IPv4 Address | ^(?:\d{1,3}\.){3}\d{1,3}$ |
Format only, doesn't validate ranges |
| IPv4 (strict) | ^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$ |
Validates 0-255 range |
| Date (YYYY-MM-DD) | ^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])$ |
ISO 8601 format |
| Time (24-hour) | ^(?:[01]\d|2[0-3]):[0-5]\d(?::[0-5]\d)?$ |
HH:MM or HH:MM:SS |
| Phone (US) | ^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$ |
Flexible formatting |
| Hex Color | ^#(?:[0-9a-fA-F]{3}){1,2}$ |
3 or 6 digit hex codes |
| Credit Card | ^\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}$ |
Format only, use Luhn for validation |
| Username | ^[a-zA-Z0-9_-]{3,16}$ |
3-16 chars, alphanumeric plus _ and - |
| Slug | ^[a-z0-9]+(?:-[a-z0-9]+)*$ |
URL-friendly lowercase with hyphens |
Password Validation
Password validation requires checking multiple conditions simultaneously. Lookahead assertions make this possible without complex logic.
// Strong password: 8+ chars, uppercase, lowercase, digit, special char
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/
// Breaking it down:
// (?=.*[a-z]) - at least one lowercase letter
// (?=.*[A-Z]) - at least one uppercase letter
// (?=.*\d) - at least one digit
// (?=.*[@$!%*?&]) - at least one special character
// [A-Za-z\d@$!%*?&]{8,} - 8 or more allowed characters
Extracting Data from Text
Regex excels at extracting structured data from unstructured text. Here are some practical examples:
// Extract all email addresses from text
const emails = text.match(/[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}/g);
// Extract hashtags from social media text
const hashtags = text.match(/#[\w]+/g);
// Extract URLs from text
const urls = text.match(/https?:\/\/[^\s]+/g);
// Extract prices from text
const prices = text.match(/\$\d+(?:\.\d{2})?/g);
// Extract dates in various formats
const dates = text.match(/\d{1,2}[-\/]\d{1,2}[-\/]\d{2,4}/g);
Test and refine these patterns using our Regex Tester tool, which provides real-time feedback and match highlighting.
Flags: Modifying Regex Behavior
Flags (also called modifiers) change how the regex engine interprets your pattern. They're specified after the closing delimiter in most languages.
| Flag | Name | Effect |
|---|---|---|
g |
Global | Find all matches, not just the first |
i |
Case-insensitive | /hello/i matches Hello, HELLO, HeLLo |
m |
Multiline | ^ and $ match line boundaries, not just string boundaries |
s |
Dotall | . matches newline characters |
u |
Unicode | Enable full Unicode support |
y |
Sticky | Match must start at the current position |
Global Flag
Without the g flag, methods like match() return only the first match. With g, you get all matches.
const text = "cat dog cat bird cat";
// Without global flag
text.match(/cat/); // ["cat"]
// With global flag
text.match(/cat/g); // ["cat", "cat", "cat"]
Multiline Flag
The m flag changes how ^ and $ work. Instead of matching only the start and end of the entire string, they match the start and end of each line.
const text = "line 1\nline 2\nline 3";
// Without multiline flag
text.match(/^\w+/); // ["line"] (only first line)
// With multiline flag
text.match(/^\w+/gm); // ["line", "line", "line"] (all lines)
Unicode Flag
The u flag enables proper Unicode handling, including emoji and characters outside the Basic Multilingual Plane.
// Without unicode flag - treats emoji as two characters
/^.$/ .test("😀"); // false
// With unicode flag - treats emoji as one character
/^.$/u .test("😀"); // true
Advanced Techniques
Lookahead and Lookbehind Assertions
Assertions check if a pattern exists without including it in the match. They're zero-width—they don't consume characters.
// Positive lookahead: (?=...)
// Match numbers followed by " dollars"
/\d+(?= dollars)/
// In "100 dollars", matches "100" but not " dollars"
// Negative lookahead: (?!...)
// Match numbers NOT followed by " dollars"
/\d+(?! dollars)/
// Positive lookbehind: (?<=...)
// Match numbers preceded by "$"
/(?<=\$)\d+/
// In "$100", matches "100" but not "$"
// Negative lookbehind: (?<!...)
// Match numbers NOT preceded by "$"
/(?<!\$)\d+/
Conditional Patterns
Some regex flavors support conditional patterns that match different things based on whether a previous group matched.
// Match quoted strings with matching quotes
/(["'])(.*?)\1/
// The \1 backreference ensures closing quote matches opening quote
Atomic Groups
Atomic groups (?>...) prevent backtracking, which can improve performance for certain patterns. Once the group matches, the engine won't reconsider alternative matches.
// Without atomic group (can be slow)
/\d+\w+/
// With atomic group (faster)
/(?>\d+)\w+/
Quick tip: Lookahead and lookbehind are powerful but can be confusing. Remember: lookahead checks what comes after the current position, lookbehind checks what comes before. Neither includes the checked text in the match.
Performance Tips and Best Practices
Poorly written regex can cause serious performance problems, including catastrophic backtracking that can hang your application. Follow these guidelines to write efficient patterns.
Avoid Catastrophic Backtracking
Nested quantifiers can cause exponential time complexity. The pattern (a+)+ is dangerous because the engine tries many different ways to match the same string.
// BAD: Nested quantifiers
/(a+)+b/
// On "aaaaaaaaaaaaaaaaaaaaaaaac" this can take forever
// GOOD: Use possessive quantifiers or atomic groups
/(a++)+b/ or /(?>a+)+b/
// BETTER: Simplify the pattern
/a+b/
Be Specific
The more specific your pattern, the faster it runs. Avoid overly broad patterns like .* when you can be more precise.
// Less efficient
/<.*>/
// More efficient (matches only valid tag characters)
/<[a-zA-Z][a-zA-Z0-9]*>/
Anchor Your Patterns
When validating entire strings, always use ^ and $ anchors. This prevents the engine from searching through the entire string.
// Without anchors - searches entire string
/\d{5}/
// With anchors - checks only if entire string matches
/^\d{5}$/
Use Non-Capturing Groups
Capturing groups are slower than non-capturing groups. If you don't need the captured text, use (?:...) instead of (...).
// Slower (captures unnecessarily)
/(https?|ftp):\/\/([a-z.]+)/
// Faster (only captures what you need)
/(?:https?|ftp):\/\/([a-z.]+)/
Compile Once, Use Many Times
In most languages, compiling a regex pattern has overhead. Store compiled patterns in variables and reuse them.
// Inefficient - compiles pattern every iteration
for (const str of strings) {
if (/^\d+$/.test(str)) { /* ... */ }
}
// Efficient - compiles once
const digitPattern = /^\d+$/;
for (const str of strings) {
if (digitPattern.test(str)) { /* ... */ }
}
Consider Alternatives
Sometimes simple string methods are faster than regex for basic operations.
// For simple checks, string methods can be faster
str.startsWith("http://") // faster than /^http:\/\//
str.includes("@") // faster than /@/
str.endsWith(".com") // faster than /\.com$/
Testing and Debugging Regex
Regex can be difficult to debug because patterns are compact and errors aren't always obvious. Use these strategies to test and refine your patterns.
Use Online Testing Tools
Visual regex testers show you exactly what your pattern matches in real-time. Our Regex Tester provides syntax highlighting, match visualization, and explanation of pattern components.
Test Edge Cases
Always test your patterns against edge cases, not just happy path examples:
- Empty strings
- Very long strings
- Strings with special characters
- Strings with Unicode characters
- Strings that almost match but shouldn't
Build Patterns Incrementally
Start with a simple pattern and add complexity gradually. Test after each addition to ensure it still works as expected.
// Start simple
/\d+/
// Add specificity
/\d{3}/
// Add context
/\(\d{