Regex Patterns Cheatsheet: Common Expressions Explained
· 12 min read
Table of Contents
Regular expressions (regex) are one of the most powerful tools in a developer's arsenal. Whether you're validating email addresses, parsing log files, extracting data from text, or building complex search-and-replace operations, regex provides a concise and flexible way to work with text patterns.
This comprehensive guide covers everything from basic syntax to advanced techniques, with practical examples you can use immediately in your projects. We'll explore regex across different programming languages, share performance optimization tips, and help you avoid common mistakes that trip up even experienced developers.
Regex Basics
At its core, a regular expression is a sequence of characters that defines a search pattern. When you run this pattern against a string, the regex engine attempts to find matches according to the rules you've specified.
A regex pattern consists of two types of characters: literal characters that match themselves exactly, and metacharacters that have special meanings. For example, the pattern /hello/ matches the literal text "hello" anywhere in a string. But /h.llo/ uses the dot metacharacter to match "hello", "hallo", "hxllo", or any other five-character sequence starting with "h" and ending with "llo".
Essential Metacharacters
Here's a reference table of the most fundamental regex metacharacters you'll use daily:
| Metacharacter | Description | Example | Matches |
|---|---|---|---|
. |
Any single character (except newline) | c.t |
cat, cot, c9t, c@t |
\d |
Any digit [0-9] | \d{3} |
123, 456, 789 |
\D |
Any non-digit | \D+ |
abc, XYZ, @#$ |
\w |
Word character [a-zA-Z0-9_] | \w+ |
hello, test_123 |
\W |
Non-word character | \W |
@, #, space, ! |
\s |
Whitespace (space, tab, newline) | \s+ |
Single or multiple spaces |
\S |
Non-whitespace | \S+ |
Any visible characters |
\b |
Word boundary | \bcat\b |
"cat" but not "category" |
^ |
Start of string/line | ^Hello |
Lines starting with "Hello" |
$ |
End of string/line | end$ |
Lines ending with "end" |
Pro tip: Use the Regex Tester to experiment with these patterns in real-time. You can test against your own sample text and see matches highlighted instantly.
Escaping Special Characters
When you need to match a literal metacharacter (like a dot or asterisk), you must escape it with a backslash. For example, \. matches a literal period, and \* matches a literal asterisk.
Characters that need escaping include: . * + ? ^ $ { } [ ] ( ) | \
Character Classes & Quantifiers
Character classes let you define sets of characters to match, while quantifiers specify how many times a pattern should repeat. Together, they form the backbone of most regex patterns.
Character Classes
Square brackets create a character class that matches any single character within the brackets:
| Pattern | Description | Example Matches |
|---|---|---|
[abc] |
Match a, b, or c | a, b, c (one character) |
[^abc] |
Match anything except a, b, or c | d, e, 1, @, etc. |
[a-z] |
Match any lowercase letter | a through z |
[A-Z] |
Match any uppercase letter | A through Z |
[0-9] |
Match any digit | 0 through 9 (same as \d) |
[a-zA-Z] |
Match any letter | All letters, any case |
[a-zA-Z0-9] |
Match any alphanumeric | Letters and numbers |
[a-z&&[^aeiou]] |
Consonants only (intersection) | b, c, d, f, g, etc. |
Quantifiers
Quantifiers specify how many times the preceding element should match. They're essential for matching patterns of varying lengths:
*β Zero or more times (greedy)+β One or more times (greedy)?β Zero or one time (optional){3}β Exactly 3 times{3,}β 3 or more times{3,6}β Between 3 and 6 times*?β Zero or more times (lazy/non-greedy)+?β One or more times (lazy/non-greedy)??β Zero or one time (lazy)
Greedy vs. Lazy Matching
Understanding the difference between greedy and lazy quantifiers is crucial for writing efficient regex patterns. By default, quantifiers are greedy β they match as much text as possible while still allowing the overall pattern to match.
Consider this example with the string "<div>content</div><div>more</div>":
- Greedy:
<div>.*</div>matches the entire string from the first<div>to the last</div> - Lazy:
<div>.*?</div>matches each<div>...</div>pair separately
The lazy version adds a ? after the quantifier, telling the regex engine to match as little as possible while still satisfying the pattern.
Quick tip: When parsing HTML or XML, always use lazy quantifiers to avoid matching across multiple tags. Better yet, use a proper parser library instead of regex for complex markup.
Common Regex Patterns
Here are battle-tested regex patterns for common validation and extraction tasks. These patterns are used in production applications worldwide.
Email Validation
A practical email validation pattern that covers most real-world cases:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
This pattern ensures the email has a local part (before @), a domain name, and a valid top-level domain. Note that perfect email validation is impossible with regex alone β the official RFC 5322 standard is incredibly complex. For production use, consider using a dedicated email validation library.
Phone Numbers
US phone number with optional country code and various formatting:
^(\+1[-.\s]?)?(\(?\d{3}\)?[-.\s]?)?\d{3}[-.\s]?\d{4}$
This matches formats like:
- 555-123-4567
- (555) 123-4567
- +1 555 123 4567
- 5551234567
URLs
Match HTTP and HTTPS URLs with optional www prefix:
^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$
IP Addresses
IPv4 address validation:
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
This ensures each octet is between 0 and 255.
Dates
ISO 8601 date format (YYYY-MM-DD):
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])$
US date format (MM/DD/YYYY):
^(0[1-9]|1[0-2])\/(0[1-9]|[12][0-9]|3[01])\/\d{4}$
Credit Card Numbers
Basic credit card validation (removes spaces and dashes):
^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|6(?:011|5[0-9]{2})[0-9]{12})$
This pattern validates Visa, MasterCard, American Express, and Discover cards. Always use the Luhn algorithm for actual validation.
Passwords
Strong password requiring at least 8 characters, one uppercase, one lowercase, one digit, and one special character:
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
This uses lookahead assertions (covered in the next section) to ensure all requirements are met.
Pro tip: Test these patterns with the Regex Tester and save your favorites for quick access. You can also use the Code Formatter to clean up regex patterns in your source code.
Anchors, Groups & Lookaheads
These advanced features give you precise control over where matches occur and how patterns are captured.
Anchors
Anchors don't match characters β they match positions in the string:
^β Start of string (or line in multiline mode)$β End of string (or line in multiline mode)\bβ Word boundary (between \w and \W)\Bβ Non-word boundary\Aβ Start of string (always, even in multiline mode)\Zβ End of string (always, even in multiline mode)
Example: \bcat\b matches "cat" as a whole word but not the "cat" in "category" or "concatenate".
Capturing Groups
Parentheses create capturing groups that extract matched substrings:
^(\d{3})-(\d{3})-(\d{4})$
This pattern matches a phone number and captures the area code, prefix, and line number separately. You can reference these captures in replacement strings or extract them programmatically.
Non-Capturing Groups
Use (?:...) when you need grouping but don't want to capture the match:
(?:https?|ftp)://[^\s]+
This matches URLs starting with http, https, or ftp without creating a capture group for the protocol.
Named Capturing Groups
Named groups make your regex more readable and maintainable:
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
You can reference these by name instead of by number, making your code clearer.
Lookahead Assertions
Lookaheads check if a pattern exists ahead without consuming characters:
(?=...)β Positive lookahead (must be followed by)(?!...)β Negative lookahead (must not be followed by)
Example: \d+(?= dollars) matches numbers followed by " dollars" but doesn't include " dollars" in the match.
Lookbehind Assertions
Lookbehinds check what comes before the current position:
(?<=...)β Positive lookbehind (must be preceded by)(?<!...)β Negative lookbehind (must not be preceded by)
Example: (?<=\$)\d+ matches numbers preceded by a dollar sign but doesn't include the $ in the match.
Note that JavaScript only gained lookbehind support in ES2018, so check compatibility if you're supporting older browsers.
Regex in JavaScript
JavaScript provides robust regex support through the RegExp object and string methods. Here's how to use regex effectively in JavaScript applications.
Creating Regex Patterns
You can create regex patterns in two ways:
// Literal notation (preferred for static patterns)
const pattern1 = /\d{3}-\d{3}-\d{4}/;
// Constructor (useful for dynamic patterns)
const pattern2 = new RegExp('\\d{3}-\\d{3}-\\d{4}');
Note the double backslashes in the constructor β you need to escape backslashes in strings.
Regex Flags
JavaScript supports several flags that modify regex behavior:
gβ Global search (find all matches, not just first)iβ Case-insensitive matchingmβ Multiline mode (^ and $ match line boundaries)sβ Dot matches newlines (dotAll mode)uβ Unicode mode (proper handling of Unicode characters)yβ Sticky mode (matches from lastIndex position)
Example: /hello/gi finds all occurrences of "hello" case-insensitively.
String Methods with Regex
JavaScript strings have several methods that accept regex patterns:
const text = "Contact us at [email protected] or [email protected]";
const emailPattern = /[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}/gi;
// Find first match
const match = text.match(emailPattern);
// Returns: ["[email protected]", "[email protected]"]
// Test if pattern exists
const hasEmail = emailPattern.test(text);
// Returns: true
// Replace matches
const redacted = text.replace(emailPattern, '[EMAIL]');
// Returns: "Contact us at [EMAIL] or [EMAIL]"
// Split by pattern
const parts = text.split(/\s+/);
// Splits on whitespace
// Search for pattern position
const position = text.search(emailPattern);
// Returns: 14 (index of first match)
RegExp Methods
The RegExp object also has methods for pattern matching:
const pattern = /(\d{3})-(\d{3})-(\d{4})/;
const phone = "Call 555-123-4567 for info";
// exec() returns detailed match information
const result = pattern.exec(phone);
console.log(result[0]); // "555-123-4567" (full match)
console.log(result[1]); // "555" (first capture group)
console.log(result[2]); // "123" (second capture group)
console.log(result[3]); // "4567" (third capture group)
Practical JavaScript Examples
Here's a real-world example of form validation:
function validateForm(data) {
const patterns = {
email: /^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$/i,
phone: /^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$/,
zipCode: /^\d{5}(-\d{4})?$/
};
const errors = {};
if (!patterns.email.test(data.email)) {
errors.email = "Invalid email address";
}
if (!patterns.phone.test(data.phone)) {
errors.phone = "Invalid phone number";
}
if (!patterns.zipCode.test(data.zipCode)) {
errors.zipCode = "Invalid ZIP code";
}
return Object.keys(errors).length === 0 ? null : errors;
}
Quick tip: Use the JSON Formatter to validate and format JSON data extracted with regex patterns. It's especially useful when parsing API responses or configuration files.
Regex in Python
Python's re module provides comprehensive regex functionality with a clean, intuitive API. Here's everything you need to know about using regex in Python.
Importing and Basic Usage
import re
# Compile a pattern (recommended for reuse)
pattern = re.compile(r'\d{3}-\d{3}-\d{4}')
# Or use module-level functions directly
match = re.search(r'\d{3}-\d{3}-\d{4}', 'Call 555-123-4567')
Always use raw strings (prefix with r) for regex patterns to avoid issues with backslash escaping.
Python Regex Functions
The re module provides several functions for different matching scenarios:
import re
text = "Email: [email protected], Phone: 555-123-4567"
# search() - Find first match anywhere in string
match = re.search(r'\d{3}-\d{3}-\d{4}', text)
if match:
print(match.group()) # "555-123-4567"
# match() - Match at beginning of string only
match = re.match(r'Email:', text)
if match:
print("Starts with Email:")
# findall() - Find all matches as list
emails = re.findall(r'[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}', text, re.IGNORECASE)
print(emails) # ['[email protected]']
# finditer() - Find all matches as iterator (memory efficient)
for match in re.finditer(r'\d+', text):
print(f"Found number: {match.group()} at position {match.start()}")
# sub() - Replace matches
redacted = re.sub(r'\d{3}-\d{3}-\d{4}', '[PHONE]', text)
print(redacted) # "Email: [email protected], Phone: [PHONE]"
# split() - Split string by pattern
parts = re.split(r'[,:]', text)
print(parts) # ['Email', ' [email protected]', ' Phone', ' 555-123-4567']
Regex Flags in Python
Python supports several flags to modify regex behavior:
re.IGNORECASEorre.Iβ Case-insensitive matchingre.MULTILINEorre.Mβ ^ and $ match line boundariesre.DOTALLorre.Sβ Dot matches newlinesre.VERBOSEorre.Xβ Allow comments and whitespace in patternre.ASCIIorre.Aβ ASCII-only matching for \w, \b, etc.re.UNICODEorre.Uβ Unicode matching (default in Python 3)
Combine flags with the bitwise OR operator:
pattern = re.compile(r'hello', re.IGNORECASE | re.MULTILINE)
Working with Match Objects
Match objects provide detailed information about matches:
import re
text = "Date: 2026-03-31"
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.search(pattern, text)
if match:
# Access full match
print(match.group()) # "2026-03-31"
print(match.group(0)) # Same as above
# Access by group number
print(match.group(1)) # "2026"
print(match.group(2)) # "03"
print(match.group(3)) # "31"
# Access by group name
print(match.group('year')) # "2026"
print(match.group('month')) # "03"
print(match.group('day')) # "31"
# Get all groups as tuple
print(match.groups()) # ("2026", "03", "31")
# Get all groups as dictionary
print(match.groupdict()) # {'year': '2026', 'month': '03', 'day': '31'}
# Get match position
print(match.start()) # 6
print(match.end()) # 16
print(match.span()) # (6, 16)
Practical Python Examples
Here's a log parser that extracts structured data:
import re
from datetime import datetime
def parse_log_file(filename):
# Pattern for Apache-style log entries
pattern = re.compile(
r'(?P<ip>\d+\.\d+\.\d+\.\d+) '
r'- - '
r'\[(?P<timestamp>[^\]]+)\] '
r'"(?P<method>\w+) (?P<path>[^\s]+) HTTP/[^"]+?" '
r'(?P<status>\d{3}) '
r'(?P<size>\d+)'
)
entries = []
with open(filename, 'r') as f:
for line in f:
match = pattern.search(line)
if match:
entry = match.groupdict()
entry['status'] = int(entry['status'])
entry['size'] = int(entry['size'])
entries.append(entry)
return entries
# Usage
logs = parse_log_file('access.log')
errors = [log for log in logs if log['status'] >= 400]
print(f"Found {len(errors)} error responses")
Advanced Regex Techniques
Once you've mastered the basics, these advanced techniques will help you write more powerful and efficient regex patterns.
Atomic Groups
Atomic groups (?>...) prevent backtracking within the group, improving performance:
(?>\d+)\.
This matches digits followed by a period, but once the digits are matched, the regex engine won't backtrack into them if the period doesn't match.
Conditional Patterns
Conditional patterns allow different matching based on whether a group matched:
(?(1)yes-pattern|no-pattern)
Example: Match optional area code in phone numbers:
^(\(?\d{3}\)?[-.\s]?)?\d{3}[-.\s]?\d{4}$