Regex Patterns Cheatsheet: Common Expressions Explained

· 12 min read

Table of Contents

Regular expressions (regex) are one of the most powerful tools in a developer's arsenal. Whether you're validating email addresses, parsing log files, extracting data from text, or building complex search-and-replace operations, regex provides a concise and flexible way to work with text patterns.

This comprehensive guide covers everything from basic syntax to advanced techniques, with practical examples you can use immediately in your projects. We'll explore regex across different programming languages, share performance optimization tips, and help you avoid common mistakes that trip up even experienced developers.

Regex Basics

At its core, a regular expression is a sequence of characters that defines a search pattern. When you run this pattern against a string, the regex engine attempts to find matches according to the rules you've specified.

A regex pattern consists of two types of characters: literal characters that match themselves exactly, and metacharacters that have special meanings. For example, the pattern /hello/ matches the literal text "hello" anywhere in a string. But /h.llo/ uses the dot metacharacter to match "hello", "hallo", "hxllo", or any other five-character sequence starting with "h" and ending with "llo".

Essential Metacharacters

Here's a reference table of the most fundamental regex metacharacters you'll use daily:

Metacharacter Description Example Matches
. Any single character (except newline) c.t cat, cot, c9t, c@t
\d Any digit [0-9] \d{3} 123, 456, 789
\D Any non-digit \D+ abc, XYZ, @#$
\w Word character [a-zA-Z0-9_] \w+ hello, test_123
\W Non-word character \W @, #, space, !
\s Whitespace (space, tab, newline) \s+ Single or multiple spaces
\S Non-whitespace \S+ Any visible characters
\b Word boundary \bcat\b "cat" but not "category"
^ Start of string/line ^Hello Lines starting with "Hello"
$ End of string/line end$ Lines ending with "end"

Pro tip: Use the Regex Tester to experiment with these patterns in real-time. You can test against your own sample text and see matches highlighted instantly.

Escaping Special Characters

When you need to match a literal metacharacter (like a dot or asterisk), you must escape it with a backslash. For example, \. matches a literal period, and \* matches a literal asterisk.

Characters that need escaping include: . * + ? ^ $ { } [ ] ( ) | \

Character Classes & Quantifiers

Character classes let you define sets of characters to match, while quantifiers specify how many times a pattern should repeat. Together, they form the backbone of most regex patterns.

Character Classes

Square brackets create a character class that matches any single character within the brackets:

Pattern Description Example Matches
[abc] Match a, b, or c a, b, c (one character)
[^abc] Match anything except a, b, or c d, e, 1, @, etc.
[a-z] Match any lowercase letter a through z
[A-Z] Match any uppercase letter A through Z
[0-9] Match any digit 0 through 9 (same as \d)
[a-zA-Z] Match any letter All letters, any case
[a-zA-Z0-9] Match any alphanumeric Letters and numbers
[a-z&&[^aeiou]] Consonants only (intersection) b, c, d, f, g, etc.

Quantifiers

Quantifiers specify how many times the preceding element should match. They're essential for matching patterns of varying lengths:

Greedy vs. Lazy Matching

Understanding the difference between greedy and lazy quantifiers is crucial for writing efficient regex patterns. By default, quantifiers are greedy β€” they match as much text as possible while still allowing the overall pattern to match.

Consider this example with the string "<div>content</div><div>more</div>":

The lazy version adds a ? after the quantifier, telling the regex engine to match as little as possible while still satisfying the pattern.

Quick tip: When parsing HTML or XML, always use lazy quantifiers to avoid matching across multiple tags. Better yet, use a proper parser library instead of regex for complex markup.

Common Regex Patterns

Here are battle-tested regex patterns for common validation and extraction tasks. These patterns are used in production applications worldwide.

Email Validation

A practical email validation pattern that covers most real-world cases:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

This pattern ensures the email has a local part (before @), a domain name, and a valid top-level domain. Note that perfect email validation is impossible with regex alone β€” the official RFC 5322 standard is incredibly complex. For production use, consider using a dedicated email validation library.

Phone Numbers

US phone number with optional country code and various formatting:

^(\+1[-.\s]?)?(\(?\d{3}\)?[-.\s]?)?\d{3}[-.\s]?\d{4}$

This matches formats like:

URLs

Match HTTP and HTTPS URLs with optional www prefix:

^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$

IP Addresses

IPv4 address validation:

^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

This ensures each octet is between 0 and 255.

Dates

ISO 8601 date format (YYYY-MM-DD):

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])$

US date format (MM/DD/YYYY):

^(0[1-9]|1[0-2])\/(0[1-9]|[12][0-9]|3[01])\/\d{4}$

Credit Card Numbers

Basic credit card validation (removes spaces and dashes):

^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|6(?:011|5[0-9]{2})[0-9]{12})$

This pattern validates Visa, MasterCard, American Express, and Discover cards. Always use the Luhn algorithm for actual validation.

Passwords

Strong password requiring at least 8 characters, one uppercase, one lowercase, one digit, and one special character:

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

This uses lookahead assertions (covered in the next section) to ensure all requirements are met.

Pro tip: Test these patterns with the Regex Tester and save your favorites for quick access. You can also use the Code Formatter to clean up regex patterns in your source code.

Anchors, Groups & Lookaheads

These advanced features give you precise control over where matches occur and how patterns are captured.

Anchors

Anchors don't match characters β€” they match positions in the string:

Example: \bcat\b matches "cat" as a whole word but not the "cat" in "category" or "concatenate".

Capturing Groups

Parentheses create capturing groups that extract matched substrings:

^(\d{3})-(\d{3})-(\d{4})$

This pattern matches a phone number and captures the area code, prefix, and line number separately. You can reference these captures in replacement strings or extract them programmatically.

Non-Capturing Groups

Use (?:...) when you need grouping but don't want to capture the match:

(?:https?|ftp)://[^\s]+

This matches URLs starting with http, https, or ftp without creating a capture group for the protocol.

Named Capturing Groups

Named groups make your regex more readable and maintainable:

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

You can reference these by name instead of by number, making your code clearer.

Lookahead Assertions

Lookaheads check if a pattern exists ahead without consuming characters:

Example: \d+(?= dollars) matches numbers followed by " dollars" but doesn't include " dollars" in the match.

Lookbehind Assertions

Lookbehinds check what comes before the current position:

Example: (?<=\$)\d+ matches numbers preceded by a dollar sign but doesn't include the $ in the match.

Note that JavaScript only gained lookbehind support in ES2018, so check compatibility if you're supporting older browsers.

Regex in JavaScript

JavaScript provides robust regex support through the RegExp object and string methods. Here's how to use regex effectively in JavaScript applications.

Creating Regex Patterns

You can create regex patterns in two ways:

// Literal notation (preferred for static patterns)
const pattern1 = /\d{3}-\d{3}-\d{4}/;

// Constructor (useful for dynamic patterns)
const pattern2 = new RegExp('\\d{3}-\\d{3}-\\d{4}');

Note the double backslashes in the constructor β€” you need to escape backslashes in strings.

Regex Flags

JavaScript supports several flags that modify regex behavior:

Example: /hello/gi finds all occurrences of "hello" case-insensitively.

String Methods with Regex

JavaScript strings have several methods that accept regex patterns:

const text = "Contact us at [email protected] or [email protected]";
const emailPattern = /[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}/gi;

// Find first match
const match = text.match(emailPattern);
// Returns: ["[email protected]", "[email protected]"]

// Test if pattern exists
const hasEmail = emailPattern.test(text);
// Returns: true

// Replace matches
const redacted = text.replace(emailPattern, '[EMAIL]');
// Returns: "Contact us at [EMAIL] or [EMAIL]"

// Split by pattern
const parts = text.split(/\s+/);
// Splits on whitespace

// Search for pattern position
const position = text.search(emailPattern);
// Returns: 14 (index of first match)

RegExp Methods

The RegExp object also has methods for pattern matching:

const pattern = /(\d{3})-(\d{3})-(\d{4})/;
const phone = "Call 555-123-4567 for info";

// exec() returns detailed match information
const result = pattern.exec(phone);
console.log(result[0]); // "555-123-4567" (full match)
console.log(result[1]); // "555" (first capture group)
console.log(result[2]); // "123" (second capture group)
console.log(result[3]); // "4567" (third capture group)

Practical JavaScript Examples

Here's a real-world example of form validation:

function validateForm(data) {
  const patterns = {
    email: /^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$/i,
    phone: /^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$/,
    zipCode: /^\d{5}(-\d{4})?$/
  };

  const errors = {};

  if (!patterns.email.test(data.email)) {
    errors.email = "Invalid email address";
  }

  if (!patterns.phone.test(data.phone)) {
    errors.phone = "Invalid phone number";
  }

  if (!patterns.zipCode.test(data.zipCode)) {
    errors.zipCode = "Invalid ZIP code";
  }

  return Object.keys(errors).length === 0 ? null : errors;
}

Quick tip: Use the JSON Formatter to validate and format JSON data extracted with regex patterns. It's especially useful when parsing API responses or configuration files.

Regex in Python

Python's re module provides comprehensive regex functionality with a clean, intuitive API. Here's everything you need to know about using regex in Python.

Importing and Basic Usage

import re

# Compile a pattern (recommended for reuse)
pattern = re.compile(r'\d{3}-\d{3}-\d{4}')

# Or use module-level functions directly
match = re.search(r'\d{3}-\d{3}-\d{4}', 'Call 555-123-4567')

Always use raw strings (prefix with r) for regex patterns to avoid issues with backslash escaping.

Python Regex Functions

The re module provides several functions for different matching scenarios:

import re

text = "Email: [email protected], Phone: 555-123-4567"

# search() - Find first match anywhere in string
match = re.search(r'\d{3}-\d{3}-\d{4}', text)
if match:
    print(match.group())  # "555-123-4567"

# match() - Match at beginning of string only
match = re.match(r'Email:', text)
if match:
    print("Starts with Email:")

# findall() - Find all matches as list
emails = re.findall(r'[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}', text, re.IGNORECASE)
print(emails)  # ['[email protected]']

# finditer() - Find all matches as iterator (memory efficient)
for match in re.finditer(r'\d+', text):
    print(f"Found number: {match.group()} at position {match.start()}")

# sub() - Replace matches
redacted = re.sub(r'\d{3}-\d{3}-\d{4}', '[PHONE]', text)
print(redacted)  # "Email: [email protected], Phone: [PHONE]"

# split() - Split string by pattern
parts = re.split(r'[,:]', text)
print(parts)  # ['Email', ' [email protected]', ' Phone', ' 555-123-4567']

Regex Flags in Python

Python supports several flags to modify regex behavior:

Combine flags with the bitwise OR operator:

pattern = re.compile(r'hello', re.IGNORECASE | re.MULTILINE)

Working with Match Objects

Match objects provide detailed information about matches:

import re

text = "Date: 2026-03-31"
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.search(pattern, text)

if match:
    # Access full match
    print(match.group())      # "2026-03-31"
    print(match.group(0))     # Same as above
    
    # Access by group number
    print(match.group(1))     # "2026"
    print(match.group(2))     # "03"
    print(match.group(3))     # "31"
    
    # Access by group name
    print(match.group('year'))   # "2026"
    print(match.group('month'))  # "03"
    print(match.group('day'))    # "31"
    
    # Get all groups as tuple
    print(match.groups())     # ("2026", "03", "31")
    
    # Get all groups as dictionary
    print(match.groupdict())  # {'year': '2026', 'month': '03', 'day': '31'}
    
    # Get match position
    print(match.start())      # 6
    print(match.end())        # 16
    print(match.span())       # (6, 16)

Practical Python Examples

Here's a log parser that extracts structured data:

import re
from datetime import datetime

def parse_log_file(filename):
    # Pattern for Apache-style log entries
    pattern = re.compile(
        r'(?P<ip>\d+\.\d+\.\d+\.\d+) '
        r'- - '
        r'\[(?P<timestamp>[^\]]+)\] '
        r'"(?P<method>\w+) (?P<path>[^\s]+) HTTP/[^"]+?" '
        r'(?P<status>\d{3}) '
        r'(?P<size>\d+)'
    )
    
    entries = []
    with open(filename, 'r') as f:
        for line in f:
            match = pattern.search(line)
            if match:
                entry = match.groupdict()
                entry['status'] = int(entry['status'])
                entry['size'] = int(entry['size'])
                entries.append(entry)
    
    return entries

# Usage
logs = parse_log_file('access.log')
errors = [log for log in logs if log['status'] >= 400]
print(f"Found {len(errors)} error responses")

Advanced Regex Techniques

Once you've mastered the basics, these advanced techniques will help you write more powerful and efficient regex patterns.

Atomic Groups

Atomic groups (?>...) prevent backtracking within the group, improving performance:

(?>\d+)\.

This matches digits followed by a period, but once the digits are matched, the regex engine won't backtrack into them if the period doesn't match.

Conditional Patterns

Conditional patterns allow different matching based on whether a group matched:

(?(1)yes-pattern|no-pattern)

Example: Match optional area code in phone numbers:

^(\(?\d{3}\)?[-.\s]?)?\d{3}[-.\s]?\d{4}$