Introducing Regular Expressions (Part 1) - ' Special meanings, examples ' (
Page 3 of 3 )
I have mentioned that the backslash (\) character has a special meaning. It says, in effect, "Do not interpret the next character in the regular expression as usual." It has a wide range of uses in regular expressions, and they can be divided into two categories:
Interpret a metacharacter as itself rather than as a metacharacter. For example, to match a backslash the regular expression would contain \\. The same is true of the other metacharacters $ ^ [ ] . ( ) and |. Thus, \$ matches a dollar sign, \( matches a left parenthesis, and so on.
Interpret a regular character in a special way. Many two-character sequences in the form \x (where x is an upper- or lower-case letter) have special meaning as described in table 3.
Table 3. Metacharacters starting with the backslash.
Metacharacter
Examples that matches
\b
A word boundary
\t
A tab character
\r
A carriage return
\n
A newline (linefeed)
\w
Any word character (All the letters, both upper- and lower-case, the digits 0-9, and the underscore).
\W
Any non-word character.
\s
Any whitespace character (space, tab, newline, carriage return).
\S
Any non-whitespace character.
\d
Any digit 0-9.
\D
Any character that is not a digit 0-9.
Character Classes
ADVERTISEMENT
Creating regular expressions is often simplified by the use of character classes. A character class, enclosed in square brackets [], provides a way of defining a set of characters. In the simplest form, a character class is simply a list of the characters to match. For example, the character class [aeiou] matches any single vowel. Character classes also support character ranges. For example, within a character class a-z means any lowercase letter, a-p means any lowercase letter a through p, 0-9 means any single digit, and so on.
The regular expression metacharacters are treated differently within a character expression, simply as themselves. The character expression [\t], for example, matches either a backslash or a lower-case t, not a tab character as \t would when not inside brackets. The ^ character also has a different meaning within a character expression, meaning "anything except." Thus, [^aeiou] would match any character that is not a lower-case vowel. (However, the ^ character must come first inside the brackets to mean "anything except." Otherwise, it will just be considered part of the character set.)
Some Examples
It should be clear to you now that regular expressions provide a great deal or power when it comes to matching text. I'll finish this article with some examples that may be useful in the real world. Some of these examples show that there is sometimes more than one way to accomplish something with regular expressions.
Table 4. Some real-world regular expression examples.
Regular Expression
Matches
^[0-9]{5}$
or
^\d{5}$
Any 5 digit number, but not a 5 digit number that is part of a larger string. Use to validate 5 digit ZIP codes.
^(\d{5})|(\d{5}-\d{4}$
or
^(\d{5}(-\d{4})?$
And 5 digit number or any 5 digit number followed by a hyphen and a 4 digit number. Use to validate 5 digit or 5+4 digit ZIP codes
[^-][0-9]
or
[^-]\d
Any single digit that is not preceded by a minus sign.
^C:\\
Any text that begins with the characters C:\
^[a-zA-Z0-9_]+\.[a-zA-Z]{3}$
or
^\w+\.[a-zA-Z]{3}$
A file name that consists of a name that is one or more characters long with only letters, digits, and the underscore permitted; a period; an extension that is 3 characters long, restricted to letters.
^[+-]?\d+(\.\d+)?$
Any number with or without a preceding + or - sign and with or without a decimal portion.
^[2-9]\d{2}-\d{3}-\d{4}$
A 10-digit phone number in the form XXX-XXX-XXXX where the first digit is 2-9 inclusive and the others can be 0-9.
In the next installment, I will take you through the .NET classes that help you use regular expressions in your own .NET code.