Languages - DevSource
DevSource: Microsoft Developer Resource DevSource Home Sponsored by Microsoft Home Add Ons Architecture Languages Techniques Using VS Forums
Home arrow Languages arrow Page 3 - Introducing Regular Expressions (Part 1)
Introducing Regular Expressions (Part 1)
By Peter Aitken

Rate This Article: Add This Article To:

Introducing Regular Expressions (Part 1) - ' Special meanings, examples '
( Page 3 of 3 )

I have mentioned that the backslash (\) character has a special meaning. It says, in effect, "Do not interpret the next character in the regular expression as usual." It has a wide range of uses in regular expressions, and they can be divided into two categories:

  • Interpret a metacharacter as itself rather than as a metacharacter. For example, to match a backslash the regular expression would contain \\. The same is true of the other metacharacters $ ^ [ ] . ( ) and |. Thus, \$ matches a dollar sign, \( matches a left parenthesis, and so on.
  • Interpret a regular character in a special way. Many two-character sequences in the form \x (where x is an upper- or lower-case letter) have special meaning as described in table 3.
Table 3. Metacharacters starting with the backslash.
MetacharacterExamples that matches
\bA word boundary
\tA tab character
\rA carriage return
\nA newline (linefeed)
\wAny word character (All the letters, both upper- and lower-case, the digits 0-9, and the underscore).
\WAny non-word character.
\sAny whitespace character (space, tab, newline, carriage return).
\SAny non-whitespace character.
\dAny digit 0-9.
\DAny character that is not a digit 0-9.

Character Classes

ADVERTISEMENT

Creating regular expressions is often simplified by the use of character classes. A character class, enclosed in square brackets [], provides a way of defining a set of characters. In the simplest form, a character class is simply a list of the characters to match. For example, the character class [aeiou] matches any single vowel. Character classes also support character ranges. For example, within a character class a-z means any lowercase letter, a-p means any lowercase letter a through p, 0-9 means any single digit, and so on.

The regular expression metacharacters are treated differently within a character expression, simply as themselves. The character expression [\t], for example, matches either a backslash or a lower-case t, not a tab character as \t would when not inside brackets. The ^ character also has a different meaning within a character expression, meaning "anything except." Thus, [^aeiou] would match any character that is not a lower-case vowel. (However, the ^ character must come first inside the brackets to mean "anything except." Otherwise, it will just be considered part of the character set.)

Some Examples

It should be clear to you now that regular expressions provide a great deal or power when it comes to matching text. I'll finish this article with some examples that may be useful in the real world. Some of these examples show that there is sometimes more than one way to accomplish something with regular expressions.

Table 4. Some real-world regular expression examples.
Regular ExpressionMatches
^[0-9]{5}$ or ^\d{5}$Any 5 digit number, but not a 5 digit number that is part of a larger string. Use to validate 5 digit ZIP codes.
^(\d{5})|(\d{5}-\d{4}$ or ^(\d{5}(-\d{4})?$And 5 digit number or any 5 digit number followed by a hyphen and a 4 digit number. Use to validate 5 digit or 5+4 digit ZIP codes
[^-][0-9] or [^-]\dAny single digit that is not preceded by a minus sign.
^C:\\Any text that begins with the characters C:\
^[a-zA-Z0-9_]+\.[a-zA-Z]{3}$ or ^\w+\.[a-zA-Z]{3}$A file name that consists of a name that is one or more characters long with only letters, digits, and the underscore permitted; a period; an extension that is 3 characters long, restricted to letters.
^[+-]?\d+(\.\d+)?$Any number with or without a preceding + or - sign and with or without a decimal portion.
^[2-9]\d{2}-\d{3}-\d{4}$A 10-digit phone number in the form XXX-XXX-XXXX where the first digit is 2-9 inclusive and the others can be 0-9.
In the next installment, I will take you through the .NET classes that help you use regular expressions in your own .NET code.

 
 
>>> More Languages Articles          >>> More By Peter Aitken
 



Microsoft's Future: A Chat With Their CTO, Barry Briggs

Play Video >

All Videos >

Julia explores the Robotics Studio!

Read now >

Messages to Bill Gates!

Read now >

View Now
DevSource RSS FEEDS
XML Want an easy way to keep up with breaking tech news? And the Get DevSource headlines delivered to your desktop with RSS.