Ziff-Davis Enterprise 
DevSource: Microsoft Developer Resource
Add OnsArchitectureLanguagesTechniquesUsing VSForums
 
Home arrow Using VS arrow Regular Expressions and Strings in JavaScript
Regular Expressions and Strings in JavaScript
By DevSource

Rate This Article:
Add This Article To:
When you develop applications for the web, you usually use JavaScript for the client side. JavaScript's regular expressions are surprisingly powerful. Jeff Cogswell gives you the scoop.

JavaScript has several built-in object types that can simplify your client-side programming. One such object is the RegExp object, which provides support for regular expressions.

Unlike a lot of languages, regular expressions are built right into the syntax of the language. (If you're familiar with Perl, you know how useful this can be. In fact, the regular expression support in JavaScript is modeled after that in Perl.) In this article I show you how to get regular expressions up and running in JavaScript.

ADVERTISEMENT

Note: In this article I'm assuming you're familiar with how Perl-type regular expressions work.

Creating a Regular Expression Object

There are two ways you can create a regular expression object in JavaScript. One way is to just call the RegExp constructor. Remember, in JavaScript, you can create objects by calling the function as a constructor with the help of the new keyword. Here's an example call to the RegExp constructor:

re = new RegExp("a..b", "g")

The first parameter is the regular expression pattern as a string. The second is a string containing the flags, in this case, g for global.

Another way is to make use of the syntax features of JavaScript. The previous call can just as easily be coded like this:

re2 = /a..b/g

This is where the syntax mimics that of Perl. You simply put your regular expression inside forward-slashes, and optionally follow it with your flags; the end result is a regular expression object just as if you had called the RegExp constructor.

Patterns and Flags

The patterns can include all the usual Perl-style patterns. For example, a dot matches any character; you can place character sets inside brackets; you can include the usual quantifiers such as + and *; you can group and capture using parentheses; and so on.

The allowed flags are g for global; i for ignore-case; and m for multiline. (Remember, if you don't include the g flag, your pattern will only match once.)

Using the regular expressions

So how do you use the regular expressions? The RegExp function's prototype gives your regular expression an exec function. However, in my experience, this function isn't particularly useful; I find a better way to use regular expressions is with the help of the functions provided by the string objects.

The first two important areas where you can use regular expressions are with matching and searching. The meaning of these two words—matching and searching—may be a little different than you expect from other languages.

When you search, you are looking for the first occurrence of a substring that matches the regular expression pattern. You will get back an integer representing the location of the substring.

When you match, you get back an array of the substrings that match the pattern. If you have the global flag turned off, then you'll only find the first match and your result array will have a single string. If you have the global flag turned on, you'll get back all matches, and your array will have an element for each match.

However, what I said isn't quite technically accurate, because there are some technicalities. If your pattern contains groups, you can get an array with more than one string.

But instead of trying to explain all that, I'll just show you some examples! Here goes. (For these I'm using Firefox with Firebug.)

>>> a = "abc123aXc"
"abc123aXc"
>>> a.search(/1.3/)
3
>>> a.search(/a.c/)
0
>>> a.search(/a.c/g) // global flag is ignored in search
0
>>> a.match(/a.c/)
["abc"]
>>> a.match(/a.c/g) // global flag on
["abc", "aXc"]
>>> a.match(/a.(c)/) // groups
["abc", "c"] 

The first line creates the string. The next line searches for the pattern /1.3/, which is found in position 3 (since the first position is 0). The next line searches for the pattern /a.c/, which is found at the beginning in position 0. Then I turn on the global flag, but notice it has no effect. In fact, the official JavaScript specification states that the global flag is ignored in the search method.

Next I try out the match method. Again I'm searching for /a.c/. This time I get back an array, but I only get back one element in the array, the first match found. That's because I don't have the global flag turned on.

In the next line, I turn on the global flag by adding the g character after the second slash. Then I get back all the matches in my array.

But now for that technicality I mentioned. If I add parentheses into the regular expression with the global flag turned off, I get back more than one item in my array.

Here's one with a whole bunch of parentheses:

>>> a.match(/(a(.(c)))/) // groups
["abc", "abc", "bc", "c"]

Notice I get back an item for each match inside parentheses.

Replacing

In addition to searching and matching, you can also perform replacements. Remember, however, that as with JavaScript strings in general, you're not actually modifying the string; instead, you're getting back a new string with the modifications.

Using regular expressions, you can search a string for each substring that matches the regular expression's pattern, and then you can replace the substring with a new string. But what's especially cool is you can use the matched string in the replacement.

First, here's a simple replacement using the same string as before:

>>> a.replace(/a.c/, 'ZZ')
"ZZ123aXc"

Notice the replacement only occurred in the first instance of the matching substring; the abc was replaced with ZZ, but the aXc was not. To replace both, use the global flag:

>>> a.replace(/a.c/g, 'ZZ')
"ZZ123ZZ"

But here's where the replacement can get interesting. Look closely at this example:

>>> a.replace(/(a.c)/g, '**$&**')
"**abc**123**aXc**"

Notice my replacement pattern: It's a couple of stars, then $&, then a couple of stars. And look what ended up in the result. The first match, abc, was replaced with **abc**. The second match, aXc, was replaced with **aXc**. In other words, $& refers to the substring that was matched.

If you have more than one set of parentheses, you can refer to them by their numerical position. Check out this example:

>>> b = 'abc 1b2 XbY'
"abc 1b2 XbY"
>>> b.replace(/(.)b(.)/g, '$2b$1')
"cba 2b1 YbX"

First, the string contains three substrings in the form

.b.

My pattern looks like this:

(.)b(.)

That is, I took the .b. form and put parentheses around the dots so I could grab out the character before the b, and the character after it.

The replacement looks like this:

$2b$1

In other words, my replacement will be the character after the b, then the b, and then the character before the b. That is, I'm swapping the letters that come before and after the b. Indeed, that's what I get in the result:

cba 2b1 YbX

The abc turned into cba, and so on.

Replacement Functions

The replacing doesn't end with just patterns, either. Instead of providing a new pattern to replace, you can provide a function that gets called with each match. That way you can do some really sophisticated replacements. (Make sure if you try this, however, you don't put parentheses after the function name; you want to pass the function itself into replace; you don't want to call the function and pass its result!)

Before I show you how to do this, though, I need to explain something about arguments in JavaScript. You can create functions in JavaScript that take a variable number of arguments. Declare your function header without any formal parameters, and then access the arguments using the arguments variable. Look at this example:

function argtest() {
    alert(arguments.length);
    return arguments;
}

Call this function like so, for example:

argtest(1,2,3)

You will see an alert box displaying "3" (the number of arguments) and you will get back a list containing the three arguments. Here it is in Firebug:

>>> argtest(1,2,3)
[1, 2, 3]

Thus, you can access the individual arguments in the function through the arguments list.

To use a function in a replace call, your function will receive a varying number of arguments depending on the number of submatches. (If you don't use submatches—that is, if you don't have parentheses inside parentheses in your regular expression pattern—then your function will always receive a fixed number of arguments.

The arguments passed to the function are:

The matching substring

The submatches

The position in the original string where the matching substring was found

Quite frankly, if you're not doing submatches, in many cases all you'll need is the first parameter. That's what I'll do in the following example. In that case, you're free to hardcode a formal argument, as I do here. First, here's my function (I'm making use of a handy line of code I found online):

function reverseMatch(str) {
    // source: http://www.irt.org/script/1325.htm
    return str.split(").reverse().join("); 
}

Now all you need is your replace call:

>>> a = "four score and seven years"
"four score and seven years"
>>> a.replace(/\w+/g, reverseMatch)
"ruof erocs dna neves sraey"

(The regular expression I'm using is \w+, which matches one or more word characters.) And it works!

Where to go next

There's a lot you can do with regular expressions in JavaScript. Naturally, the requirements of your web site will help you decide where you might need them. If you want to explore the topic further, I suggest downloading the actual JavaScript specification <a href=" http://www.ecma-international.org/publications/standards/Ecma-262.htm">here</a> so you can see the exact details yourself. And as always, have fun!




Discuss Regular Expressions and Strings in JavaScript
 
>>> Be the FIRST to comment on this article!
 

 
 
>>> More Using VS Articles          >>> More By DevSource
 



DevSource video
Devsource Video Series
Manipulating Society through Technology
Jeremy Bailenson, Director of the Virtual Human Interaction Lab at Stanford University, talks about virtual reality, avatars, Moore's law, how real world behaviors influence online reality, and societal manipulation through technology!
>> Play video
>> Read article
>> See all videos
DevLife Blog
Julia shows you some wicked cool use of LINQ!
MSDev Blog
Is the latest Delphi product, RAD Studio 2007, really necessary?
Make it Work
.NET makes runtime type checking a breeze. See what Peter has to say about it in this week's tips!
News
Microsoft Counts on App Support for Vista
Microsoft has taken pains to demonstrate that Windows Vista will have ample application support.
DevSource RSS FEEDS
XML Want an easy way to keep up with breaking tech news? And the Get DevSource headlines delivered to your desktop with RSS.