Hello! Are you ready to dive deep into the fascinating realm of regular expressions? Buckle up, because today, we're breaking down the arcane and powerful art of pattern searching in strings, helping you get one step closer to becoming a text wizard!
Regular Expressions (regex, for short) had humble beginnings. We can trace their inception back to the 1950s, with mathematician Stephen Kleene's work on automata theory. However, it wasn't until the 1970s when regex found its way into the programming world when Ken Thompson incorporated regex into the Unix text editor, ed
. Since then, regex has become a staple in countless programming languages and applications, allowing programmers to perform complex text processing and manipulation tasks with ease!
Regular expressions are sequences of characters that define search patterns used in pattern matching operations within strings. Regex can be as simple as searching for basic words or as intricate as identifying and extracting elements from complex data structures. They're versatile tools that come in handy in tasks like:
Now, let's learn how to craft our own regex patterns!
To grasp the nuts and bolts of regular expressions, we'll explore a few fundamental concepts:
Literals: These are the simplest regex patterns, searching for exact character matches:
pattern: cat
string: The cat is cute!
result: 'cat'
Character sets: Enclosed in brackets []
, they match a single character from a set of specified characters.
pattern: [cb]at
string: The cat and bat went splat.
result: 'cat', 'bat'
Quantifiers: Using {}
, *
, +
, or ?
, you can specify how many times a pattern should be matched:
{min,max}
: Minimum and maximum number of times{n}
: Exactly n times*
: Zero or more times+
: One or more times?
: Zero or one timepattern: ba{0,2}t
string: bbt, bat, and bt are all bat-like words.
result: 'bbt', 'bat', 'bt'
Anchors: Indicate the position of the pattern within the string:
^
: Start of the line$
: End of the linepattern: ^The.*
string: The beginning is here. The end is near.
result: 'The beginning is here.'
Alternation: Using the pipe symbol |
, you can match one pattern or another:
pattern: cat|dog
string: The cat and dog are friends.
result: 'cat', 'dog'
Groups and backreferences: Parentheses ()
let us group patterns and reference them later using \n
(n is an integer):
pattern: (c|d|p)at
string: cat, dat, and pat are three-letter words.
result: 'cat', 'dat', 'pat'
Escape sequences: A backslash \
is used to escape special characters, treating them as literals:
pattern: \$\d+\.\d{2}
string: This book costs $19.99!
result: '$19.99'
Now, let's see how regex can be used in some popular programming languages!
import re
pattern = r"\b[A-Z][a-z]*\b"
text = "Regex is Absolutely Fantastic, is it Not?"
matches = re.findall(pattern, text)
print(matches)
Output:
['Regex', 'Absolutely', 'Fantastic', 'Not']
const pattern = /\b[A-Z][a-z]*\b/g;
const text = 'Regex is Absolutely Fantastic, is it Not?';
const matches = text.match(pattern);
console.log(matches);
Output:
[ 'Regex', 'Absolutely', 'Fantastic', 'Not' ]
re.VERBOSE
flag).Phew! That was quite the journey. We've explored the whirlwind of regex, from its humble beginnings to the building blocks and practical implementations in multiple programming languages. While there's still a wealth of knowledge to discover, you now possess a solid foundation in regex to aid you on future coding adventures!
Grok.foo is a collection of articles on a variety of technology and programming articles assembled by James Padolsey. Enjoy! And please share! And if you feel like you can donate here so I can create more free content for you.