May 28, 2004 | Home > Bugzero > FAQs > KB |
Regular Expressions (from Netscape JavaScript guide)Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the exec and test methods of RegExp, and with the match, replace, search, and split methods of String. This chapter describes JavaScript regular expressions.JavaScript 1.1 and earlier. Regular expressions are not available in JavaScript 1.1 and earlier. This chapter contains the following sections:
|
Using
Simple Patterns
Simple
patterns are constructed of characters for which you want to find a
direct match. For example, the pattern /abc/ matches character
combinations in strings only when exactly the characters 'abc' occur
together and in that order. Such a match would succeed in the strings
"Hi, do you know your abc's?" and "The latest airplane designs evolved
from slabcraft." In both cases the match is with the substring 'abc'.
There is no match in the string "Grab crab" because it does not contain
the substring 'abc'.
Using Special
Characters
When the
search for a match requires something more than a direct match, such as
finding one or more b's, or finding whitespace, the pattern includes
special characters. For example, the pattern /ab*c/ matches any
character combination in which a single 'a' is followed by zero or more
'b's (* means
0 or more occurrences of the preceding item) and then immediately
followed by 'c'. In the string "cbbabbbbcdebc," the pattern matches the
substring 'abbbbc'.
The following table provides a complete list and description of the special characters that can be used in regular expressions.
Using
Parentheses
Parentheses around any part of the regular
expression pattern cause that part of the matched substring to be
remembered. Once remembered, the substring can be recalled for other
use, as described in Using
Parenthesized Substring Matches.
For example, the pattern /Chapter (\d+)\.\d*/ illustrates additional escaped and special characters and indicates that part of the pattern should be remembered. It matches precisely the characters 'Chapter ' followed by one or more numeric characters (\d means any numeric character and + means 1 or more times), followed by a decimal point (which in itself is a special character; preceding the decimal point with \ means the pattern must look for the literal character '.'), followed by any numeric character 0 or more times (\d means numeric character, * means 0 or more times). In addition, parentheses are used to remember the first matched numeric characters.
This pattern is found in "Open Chapter 4.3, paragraph 6" and '4' is remembered. The pattern is not found in "Chapter 3 and 4", because that string does not have a period after the '3'.
To match a substring without causing the matched part to be remembered, within the parentheses preface the pattern with ?:. For example, (?:\d+) matches one or numeric characters but does not remember the matched characters.
Working With
Regular Expressions
When you want to know whether a pattern is found in a string, use the test or search method; for more information (but slower execution) use the exec or match methods. If you use exec or match and if the match succeeds, these methods return an array and update properties of the associated regular expression object and also of the predefined regular expression object, RegExp. If the match fails, the exec method returns null (which converts to false).
In the following example, the script uses the exec method to find a match in a string.
<SCRIPT LANGUAGE="JavaScript1.2">
myRe=/d(b+)d/g;
myArray = myRe.exec("cdbbdbsbz");
</SCRIPT>
If you do not need to access the properties of the regular expression, an alternative way of creating myArray is with this script:
<SCRIPT LANGUAGE="JavaScript1.2">
myArray = /d(b+)d/g.exec("cdbbdbsbz");
</SCRIPT>
If you want to construct the regular expression from a string, yet another alternative is this script:
<SCRIPT LANGUAGE="JavaScript1.2">
myRe= new RegExp ("d(b+)d", "g");
myArray = myRe.exec("cdbbdbsbz");
</SCRIPT>
With these scripts, the match succeeds and returns the array and updates the properties shown in the following table.
Object |
Property or index |
Description |
In this example |
---|---|---|---|
myArray |
|
||
index
|
|||
input
|
|||
[0]
|
|||
myRe |
lastIndex
|
The index at which to start the next match. (This property is set only if the regular expression uses the g option, described in Executing a Global Search, Ignoring Case, and Considering Multiline Input.) |
|
source
|
The text of the pattern.Updated at the time that the regular expression is created, not executed. |
As shown in the second form of this example, you can use the a regular expression created with an object initializer without assigning it to a variable. If you do, however, every occurrence is a new regular expression. For this reason, if you use this form without assigning it to a variable, you cannot subsequently access the properties of that regular expression. For example, assume you have this script:
<SCRIPT LANGUAGE="JavaScript1.2">
myRe=/d(b+)d/g;
myArray = myRe.exec("cdbbdbsbz");
document.writeln("The value of lastIndex is " +
myRe.lastIndex);
</SCRIPT>
<SCRIPT LANGUAGE="JavaScript1.2">
myArray = /d(b+)d/g.exec("cdbbdbsbz");
document.writeln("The value of lastIndex is " +
/d(b+)d/g.lastIndex);
</SCRIPT>
Using
Parenthesized Substring Matches
Including parentheses in a regular
expression pattern causes the corresponding submatch to be
remembered. For example, /a(b)c/ matches the
characters 'abc' and remembers 'b'. To recall these parenthesized
substring matches, use the Array elements
[1],
..., [n].
The number of possible parenthesized substrings is unlimited. The returned array holds all that were found. The following examples illustrate how to use parenthesized substring matches.
Example 1. The following script uses the replace method to switch the words in the string. For the replacement text, the script uses the $1 and $2 in the replacement to denote the first string and second parenthesized substring match.
<SCRIPT LANGUAGE="JavaScript1.2">
re = /(\w+)\s(\w+)/;
str = "John Smith";
newstr = str.replace(re, "$2, $1");
document.write(newstr)
</SCRIPT>
Example 2. In the following example, RegExp.input is set by the Change event. In the getInfo function, the exec method, called using the () shortcut notation, uses the value of RegExp.input as its argument.
<SCRIPT LANGUAGE="JavaScript1.2">
function getInfo(){
a = /(\w+)\s(\d+)/();
window.alert(a[1] + ", your age is " +
a[2]);
}
</SCRIPT>
Enter your first name and your age, and then press Enter.
<FORM>
<INPUT TYPE="text" NAME="NameAge"
onChange="getInfo(this);">
</FORM>
Executing a Global Search, Ignoring Case, and
Considering Multiline Input
Regular expressions have three optional
flags that allow for global and case insensitive searching. To
indicate a global search, use the g flag. To indicate
a case-insensitive search, use the i flag. To indicate
a multi-line search, use the m flag. These flags
can be used separately or together in any order, and are included
as part of the regular expression.
To include a flag with the regular expression, use this syntax:
re = /pattern/flags
re = new RegExp("pattern", ["flags"])
Note that the flags are an integral part of a regular expression. They cannot be added or removed later.
For example, re = /\w+\s/g creates a regular expression that looks for one or more characters followed by a space, and it looks for this combination throughout the string.
<SCRIPT LANGUAGE="JavaScript1.2">
re = /\w+\s/g;
str = "fee fi fo fum";
myArray = str.match(re);
document.write(myArray);
</SCRIPT>
This displays ["fee ", "fi ", "fo "]. In this example, you could replace the line:
re = new RegExp("\\w+\\s", "g");
The m flag is used to specify that a multiline input string should be treated as multiple lines. If the m flag is used, ^ and $ match at the start or end of any line within the input string instead of the start or end of the entire string.
Changing
the Order in an Input String
The
following example illustrates the formation of regular
expressions and the use of string.split() and
string.replace().
It cleans a roughly formatted input string containing names
(first name first) separated by blanks, tabs and exactly one
semicolon. Finally, it reverses the name order (last name first)
and sorts the list.
<SCRIPT LANGUAGE="JavaScript1.2">
// The name string contains multiple spaces
and tabs,
// and may have multiple spaces between first and last
names.
names = new String ( "Harry Trump ;Fred Barney; Helen Rigby
;\
Bill Abel ;Chris Hand ")
document.write ("---------- Original String"
+ "<BR>" + "<BR>");
document.write (names + "<BR>" + "<BR>");
// Prepare two regular expression patterns
and array storage.
// Split the string into array elements.
// pattern: possible white space then
semicolon then possible white space
pattern = /\s*;\s*/;
// Break the string into pieces separated by
the pattern above and
// and store the pieces in an array called nameList
nameList = names.split (pattern);
// new pattern: one or more characters then
spaces then characters.
// Use parentheses to "memorize" portions of the pattern.
// The memorized portions are referred to later.
pattern = /(\w+)\s+(\w+)/;
// New array for holding names being
processed.
bySurnameList = new Array;
// Display the name array and populate the
new array
// with comma-separated names, last first.
//
// The replace method removes anything matching the pattern
// and replaces it with the memorized string—second
memorized portion
// followed by comma space followed by first memorized
portion.
//
// The variables $1 and $2 refer to the portions
// memorized while matching the pattern.
document.write ("---------- After Split by
Regular Expression" + "<BR>");
for ( i = 0; i < nameList.length; i++) {
document.write (nameList[i] +
"<BR>");
bySurnameList[i] = nameList[i].replace
(pattern, "$2, $1")
}
// Display the new array.
document.write ("---------- Names Reversed" + "<BR>");
for ( i = 0; i < bySurnameList.length; i++) {
document.write (bySurnameList[i] +
"<BR>")
}
// Sort by last name, then display the sorted
array.
bySurnameList.sort();
document.write ("---------- Sorted" + "<BR>");
for ( i = 0; i < bySurnameList.length; i++) {
document.write (bySurnameList[i] +
"<BR>")
}
document.write ("---------- End" + "<BR>")
Using
Special Characters to Verify Input
In
the following example, a user enters a phone number. When the
user presses Enter, the script checks the validity of the number.
If the number is valid (matches the character sequence specified
by the regular expression), the script posts a window thanking
the user and confirming the number. If the number is invalid, the
script posts a window informing the user that the phone number is
not valid.
The regular expression looks for zero or one open parenthesis \(?, followed by three digits \d{3}, followed by zero or one close parenthesis \)?, followed by one dash, forward slash, or decimal point and when found, remember the character ([-\/\.]), followed by three digits \d{3}, followed by the remembered match of a dash, forward slash, or decimal point \1, followed by four digits \d{4}.
The Change event activated when the user presses Enter sets the value of RegExp.input.
<HTML>
<SCRIPT LANGUAGE = "JavaScript1.2">
re = /\(?\d{3}\)?([-\/\.])\d{3}\1\d{4}/;
function testInfo() {
OK = re.exec();
if (!OK)
window.alert (RegExp.input
+
" isn't a
phone number with area code!")
else
window.alert ("Thanks, your
phone number is " + OK[0])
}
Enter your phone number (with area code) and
then press Enter.
<FORM>
<INPUT TYPE="text" NAME="Phone"
onChange="testInfo(this);">
</FORM>
* Reference brought to you by Bugzero, it's more than just bug tracking software!