Regular Expressions

  • Revision slug: JavaScript/Guide/Regular_Expressions
  • Revision title: Regular Expressions
  • Revision id: 47270
  • Created:
  • Creator: bmenasha
  • Is current revision? No
  • Comment 2 words added, 2 words removed

Revision Content

Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the exec and test methods of RegExp, and with the match, replace, search, and split methods of String. This chapter describes JavaScript regular   expressions.

Creating a Regular Expression

You construct a regular expression in one of two ways:

  • Using a regular expression literal, as follows:
    var re = /ab+c/;
    

    Regular expression literals provide compilation of the regular expression when the script is evaluated. When the regular expression will remain constant, use this for better performance.

  • Calling the constructor function of the RegExp object, as follows:
    var re = new RegExp("ab+c");
    

    Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.

Writing a Regular Expression Pattern

A regular expression pattern is composed of simple characters, such as /abc/, or a combination of simple and special characters, such as /ab*c/ or /Chapter (\d+)\.\d*/. The last example includes parentheses which are used as a memory device. The match made with this part of the pattern is remembered for later use, as described in {{ web.link("#Using_Parenthesized_Substring_Matches", "Using Parenthesized Substring Matches") }}.

Using Simple Patterns

Simple patterns are constructed of characters for which you want to find a direct match. For example, the pattern /abc/ matches character combinations in strings only when exactly the characters 'abc' occur together and in that order. Such a match would succeed in the strings "Hi, do you know your abc's?" and "The latest airplane designs evolved from slabcraft." In both cases the match is with the substring 'abc'. There is no match in the string "Grab crab" because it does not contain the substring 'abc'.

Using Special Characters

When the search for a match requires something more than a direct match, such as finding one or more b's, or finding white space, the pattern includes special characters. For example, the pattern /ab*c/ matches any character combination in which a single 'a' is followed by zero or more 'b's (* means 0 or more occurrences of the preceding item) and then immediately followed by 'c'. In the string "cbbabbbbcdebc," the pattern matches the substring 'abbbbc'.

The following table provides a complete list and description of the special characters that can be used in regular expressions.

Table 4.1 Special characters in regular expressions.
Character Meaning
\ Either of the following:
  • For characters that are usually treated literally, indicates that the next character is special and not to be interpreted literally.
  • For example, /b/ matches the character 'b'. By placing a backslash in front of b, that is by using /\b/, the character becomes special to mean match a word boundary.
  • For characters that are usually treated specially, indicates that the next character is not special and should be interpreted literally.
  • For example, * is a special character that means 0 or more occurrences of the preceding item should be matched; for example, /a*/ means match 0 or more a's. To match * literally, precede it with a backslash; for example, /a\*/ matches 'a*'.
  • Also do not forget to escape \ itself while using the new RegExp("pattern") notation since \ is also an escape character in strings.
^

Matches beginning of input. If the multiline flag is set to true, also matches immediately after a line break character.

For example, /^A/ does not match the 'A' in "an A", but does match the 'A' in "An E".

$

Matches end of input. If the multiline flag is set to true, also matches immediately before a line break character.

For example, /t$/ does not match the 't' in "eater", but does match it in "eat".

*

Matches the preceding character 0 or more times.

For example, /bo*/ matches 'boooo' in "A ghost booooed" and 'b' in "A bird warbled", but nothing in "A goat grunted".

+

Matches the preceding character 1 or more times. Equivalent to {1,}.

For example, /a+/ matches the 'a' in "candy" and all the a's in "caaaaaaandy".

?

Matches the preceding character 0 or 1 time. Equivalent to {0,1}.

For example, /e?le?/ matches the 'el' in "angel" and the 'le' in "angle" and also the 'l' in "oslo".

If used immediately after any of the quantifiers *, +?, or {}, makes the quantifier non-greedy (matching the minimum number of times), as opposed to the default, which is greedy (matching the maximum number of times). For example, using /\d+/ non-global match "123abc" return "123", if using /\d+?/, only "1" will be matched.

Also used in lookahead assertions, described under x(?=y) and x(?!y) in this table.

.

(The decimal point) matches any single character except the newline character.

For example, /.n/ matches 'an' and 'on' in "nay, an apple is on the tree", but not 'nay'.

(x)

Matches 'x' and remembers the match. These are called capturing parentheses.

For example, /(foo)/ matches and remembers 'foo' in "foo bar." The matched substring can be recalled from the resulting array's elements [1], ..., [n].

(?:x) Matches 'x' but does not remember the match. These are called non-capturing parentheses. The matched substring can not be recalled from the resulting array's elements [1], ..., [n].
x(?=y)

Matches 'x' only if 'x' is followed by 'y'.

For example, /Jack(?=Sprat)/ matches 'Jack' only if it is followed by 'Sprat'. /Jack(?=Sprat|Frost)/ matches 'Jack' only if it is followed by 'Sprat' or 'Frost'. However, neither 'Sprat' nor 'Frost' is part of the match results.

x(?!y)

Matches 'x' only if 'x' is not followed by 'y'.

For example, /\d+(?!\.)/ matches a number only if it is not followed by a decimal point. The regular expression /\d+(?!\.)/.exec("3.141") matches '141' but not '3.141'.

x|y

Matches either 'x' or 'y'.

For example, /green|red/ matches 'green' in "green apple" and 'red' in "red apple."

{n}

Where n is a positive integer. Matches exactly n occurrences of the preceding character.

For example, /a{2}/ doesn't match the 'a' in "candy," but it matches all of the a's in "caandy," and the first two a's in "caaandy."

{n,}

Where n is a positive integer. Matches at least n occurrences of the preceding character.

For example, /a{2,}/ doesn't match the 'a' in "candy", but matches all of the a's in "caandy" and in "caaaaaaandy."

{n,m}

Where n and m are positive integers. Matches at least n and at most m occurrences of the preceding character.

For example, /a{1,3}/ matches nothing in "cndy", the 'a' in "candy," the first two a's in "caandy," and the first three a's in "caaaaaaandy" Notice that when matching "caaaaaaandy", the match is "aaa", even though the original string had more a's in it.

[xyz]

A character set. Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen.

For example, [abcd] is the same as [a-d]. They match the 'b' in "brisket" and the 'c' in "city".

[^xyz]

A negated or complemented character set. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hyphen.

For example, [^abc] is the same as [^a-c]. They initially match 'r' in "brisket" and 'h' in "chop."

[\b] Matches a backspace. (Not to be confused with \b.)
\b

Matches a word boundary, such as a space, a newline character, punctuation character or end of string. (Not to be confused with [\b].)

For example, /\bn\w/ matches the 'no' in "noonday";/\wy\b/ matches the 'ly' in "possibly, yesterday". Note that the "," is not included in the match.

\B

Matches a non-word boundary.

For example, /\w\Bn/ matches 'on' in "noonday", and /y\B\w/ matches 'ye' in "possibly yesterday."

\cX

Where X is a control character. Matches a control character in a string.

For example, /\cM/ matches control-M in a string.

\d

Matches a digit character. Equivalent to [0-9].

For example, /\d/ or /[0-9]/ matches '2' in "B2 is the suite number."

\D

Matches any non-digit character. Equivalent to [^0-9].

For example, /\D/ or /[^0-9]/ matches 'B' in "B2 is the suite number."

\f Matches a form-feed.
\n Matches a linefeed.
\r Matches a carriage return.
\s

Matches a single white space character, including space, tab, form feed, line feed. Equivalent to [ \f\n\r\t\v​\u00A0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​\u2028\u2029​\u202f\u205f​\u3000].

For example, /\s\w*/ matches ' bar' in "foo bar."

\S

Matches a single character other than white space. Equivalent to [^ \f\n\r\t\v​\u00A0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​\u2028\u2029​\u202f\u205f​\u3000].

For example, /\S\w*/ matches 'foo' in "foo bar."

\t Matches a tab.
\v Matches a vertical tab.
\w

Matches any alphanumeric character including the underscore. Equivalent to [A-Za-z0-9_].

For example, /\w/ matches 'a' in "apple," '5' in "$5.28," and '3' in "3D."

\W

Matches any non-word character. Equivalent to [^A-Za-z0-9_].

For example, /\W/ or /[^A-Za-z0-9_]/ matches '%' in "50%."

\n

Where n is a positive integer. A back reference to the last substring matching the n parenthetical in the regular expression (counting left parentheses).

For example, /apple(,)\sorange\1/ matches 'apple, orange,' in "apple, orange, cherry, peach."

\0 Matches a NUL character. Do not follow this with another digit.
\xhh Matches the character with the code hh (two hexadecimal digits)
\uhhhh Matches the character with the code hhhh (four hexadecimal digits).

Using Parentheses

Parentheses around any part of the regular expression pattern cause that part of the matched substring to be remembered. Once remembered, the substring can be recalled for other use, as described in {{ web.link("#Using_Parenthesized_Substring_Matches", "Using Parenthesized Substring Matches") }}.

For example, the pattern /Chapter (\d+)\.\d*/ illustrates additional escaped and special characters and indicates that part of the pattern should be remembered. It matches precisely the characters 'Chapter ' followed by one or more numeric characters (\d means any numeric character and + means 1 or more times), followed by a decimal point (which in itself is a special character; preceding the decimal point with \ means the pattern must look for the literal character '.'), followed by any numeric character 0 or more times (\d means numeric character, * means 0 or more times). In addition, parentheses are used to remember the first matched numeric characters.

This pattern is found in "Open Chapter 4.3, paragraph 6" and '4' is remembered. The pattern is not found in "Chapter 3 and 4", because that string does not have a period after the '3'.

To match a substring without causing the matched part to be remembered, within the parentheses preface the pattern with ?:. For example, (?:\d+) matches one or more numeric characters but does not remember the matched characters.

Working with Regular Expressions

Regular expressions are used with the RegExp methods test and exec and with the String methods match, replace, search, and split. These methods are explained in detail in the JavaScript Reference.

Table 4.2 Methods that use regular expressions
Method Description
exec A RegExp method that executes a search for a match in a string. It returns an array of information.
test A RegExp method that tests for a match in a string. It returns true or false.
match A String method that executes a search for a match in a string. It returns an array of information or null on a mismatch.
search A String method that tests for a match in a string. It returns the index of the match, or -1 if the search fails.
replace A String method that executes a search for a match in a string, and replaces the matched substring with a replacement substring.
split A String method that uses a regular expression or a fixed string to break a string into an array of substrings.

When you want to know whether a pattern is found in a string, use the test or search method; for more information (but slower execution) use the exec or match methods. If you use exec or match and if the match succeeds, these methods return an array and update properties of the associated regular expression object and also of the predefined regular expression object, RegExp. If the match fails, the exec method returns null (which converts to false).

In the following example, the script uses the exec method to find a match in a string.

var myRe = /d(b+)d/g;
var myArray = myRe.exec("cdbbdbsbz");

If you do not need to access the properties of the regular expression, an alternative way of creating myArray is with this script:

var myArray = /d(b+)d/g.exec("cdbbdbsbz");

If you want to construct the regular expression from a string, yet another alternative is this script:

var myRe = new RegExp("d(b+)d", "g");
var myArray = myRe.exec("cdbbdbsbz");

With these scripts, the match succeeds and returns the array and updates the properties shown in the following table.

Table 4.3 Results of regular expression execution.
Object Property or index Description In this example
myArray   The matched string and all remembered substrings. ["dbbd", "bb"]
index The 0-based index of the match in the input string. 1
input The original string. "cdbbdbsbz"
[0] The last matched characters. "dbbd"
myRe lastIndex The index at which to start the next match. (This property is set only if the regular expression uses the g option, described in {{ web.link("#Advanced_Searching_With_Flags", "Advanced Searching With Flags") }}.) 5
source The text of the pattern. Updated at the time that the regular expression is created, not executed. "d(b+)d"

As shown in the second form of this example, you can use a regular expression created with an object initializer without assigning it to a variable. If you do, however, every occurrence is a new regular expression. For this reason, if you use this form without assigning it to a variable, you cannot subsequently access the properties of that regular expression. For example, assume you have this script:

var myRe = /d(b+)d/g;
var myArray = myRe.exec("cdbbdbsbz");
console.log("The value of lastIndex is " + myRe.lastIndex);

This script displays:

The value of lastIndex is 5

However, if you have this script:

var myArray = /d(b+)d/g.exec("cdbbdbsbz");
console.log("The value of lastIndex is " + /d(b+)d/g.lastIndex);

It displays:

The value of lastIndex is 0

The occurrences of /d(b+)d/g in the two statements are different regular expression objects and hence have different values for their lastIndex property. If you need to access the properties of a regular expression created with an object initializer, you should first assign it to a variable.

Using Parenthesized Substring Matches

Including parentheses in a regular expression pattern causes the corresponding submatch to be remembered. For example, /a(b)c/ matches the characters 'abc' and remembers 'b'. To recall these parenthesized substring matches, use the Array elements [1], ..., [n].

The number of possible parenthesized substrings is unlimited. The returned array holds all that were found. The following examples illustrate how to use parenthesized substring matches.

Example 1

The following script uses the replace() method to switch the words in the string. For the replacement text, the script uses the $1 and $2 in the replacement to denote the first and second parenthesized substring matches.

var re = /(\w+)\s(\w+)/;
var str = "John Smith";
var newstr = str.replace(re, "$2, $1");
console.log(newstr);

This prints "Smith, John".

Advanced Searching With Flags

Regular expressions have four optional flags that allow for global and case insensitive searching. To indicate a global search, use the g flag. To indicate a case-insensitive search, use the i flag. To indicate a multi-line search, use the m flag. To perform a "sticky" search, that matches starting at the current position in the target string, use the y flag. These flags can be used separately or together in any order, and are included as part of the regular expression.

{{ Fx_minversion_note("3") }}

To include a flag with the regular expression, use this syntax:

var re = /pattern/flags;

or

var re = new RegExp("pattern", "flags");

Note that the flags are an integral part of a regular expression. They cannot be added or removed later.

For example, re = /\w+\s/g creates a regular expression that looks for one or more characters followed by a space, and it looks for this combination throughout the string.

var re = /\w+\s/g;
var str = "fee fi fo fum";
var myArray = str.match(re);
console.log(myArray);

This displays ["fee ", "fi ", "fo "]. In this example, you could replace the line:

var re = /\w+\s/g;

with:

var re = new RegExp("\\w+\\s", "g");

and get the same result.

The m flag is used to specify that a multiline input string should be treated as multiple lines. If the m flag is used, ^ and $ match at the start or end of any line within the input string instead of the start or end of the entire string.

Examples

The following examples show some uses of regular expressions.

Changing the Order in an Input String

The following example illustrates the formation of regular expressions and the use of string.split() and string.replace(). It cleans a roughly formatted input string containing names (first name first) separated by blanks, tabs and exactly one semicolon. Finally, it reverses the name order (last name first) and sorts the list.

// The name string contains multiple spaces and tabs,
// and may have multiple spaces between first and last names.
var names = "Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ; Chris Hand ";

var output = ["---------- Original String\n", names + "\n"];

// Prepare two regular expression patterns and array storage.
// Split the string into array elements.

// pattern: possible white space then semicolon then possible white space
var pattern = /\s*;\s*/;

// Break the string into pieces separated by the pattern above and
// store the pieces in an array called nameList
var nameList = names.split(pattern);

// new pattern: one or more characters then spaces then characters.
// Use parentheses to "memorize" portions of the pattern.
// The memorized portions are referred to later.
pattern = /(\w+)\s+(\w+)/;

// New array for holding names being processed.
var bySurnameList = [];

// Display the name array and populate the new array
// with comma-separated names, last first.
//
// The replace method removes anything matching the pattern
// and replaces it with the memorized string—second memorized portion
// followed by comma space followed by first memorized portion.
//
// The variables $1 and $2 refer to the portions
// memorized while matching the pattern.

output.push("---------- After Split by Regular Expression");

var i, len;
for (i = 0, len = nameList.length; i < len; i++){
  output.push(nameList[i]);
  bySurnameList[i] = nameList[i].replace(pattern, "$2, $1");
}

// Display the new array.
output.push("---------- Names Reversed");
for (i = 0, len = bySurnameList.length; i < len; i++){
  output.push(bySurnameList[i]);
}

// Sort by last name, then display the sorted array.
bySurnameList.sort();
output.push("---------- Sorted");
for (i = 0, len = bySurnameList.length; i < len; i++){
  output.push(bySurnameList[i]);
}

output.push("---------- End");

console.log(output.join("\n"));

Using Special Characters to Verify Input

In the following example, the user is expected to enter a phone number. When the user presses the "Check" button, the script checks the validity of the number. If the number is valid (matches the character sequence specified by the regular expression), the script shows a message thanking the user and confirming the number. If the number is invalid, the script informs the user that the phone number is not valid at all.

The regular expression looks for zero or one open parenthesis \(?, followed by three digits \d{3}, followed by zero or one close parenthesis \)?, followed by one dash, forward slash, or decimal point and when found, remember the character ([-\/\.]), followed by three digits \d{3}, followed by the remembered match of a dash, forward slash, or decimal point \1, followed by four digits \d{4}.

The Change event activated when the user presses Enter sets the value of RegExp.input.

<!DOCTYPE html>
<html>  
  <head>  
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">  
    <meta http-equiv="Content-Script-Type" content="text/javascript">  
    <script type="text/javascript">  
      var re = /\(?\d{3}\)?([-\/\.])\d{3}\1\d{4}/;  
      function testInfo(phoneInput){  
        var OK = re.exec(phoneInput.value);  
        if (!OK)  
          window.alert(RegExp.input + " isn't a phone number with area code!");  
        else
          window.alert("Thanks, your phone number is " + OK[0]);  
      }  
    </script>  
  </head>  
  <body>  
    <p>Enter your phone number (with area code) and then click "Check".
        <br>The expected format is like ###-###-####.</p>
    <form action="#">  
      <input id="phone"><button onclick="testInfo(document.getElementById('phone'));">Check</button>
    </form>  
  </body>  
</html>
autoPreviousNext("JSGChapters");
wiki.languages({
  "fr": "fr/Guide_JavaScript_1.5/Expressions_rationnelles",
  "ja": "ja/Core_JavaScript_1.5_Guide/Regular_Expressions"
});

Revision Source

<p>Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the <code>exec</code> and <code>test</code> methods of <code>RegExp</code>, and with the <code>match</code>, <code>replace</code>, <code>search</code>, and <code>split</code> methods of <code>String</code>. This chapter describes JavaScript regular   expressions.</p>
<h2>Creating a Regular Expression</h2>
<p>You construct a regular expression in one of two ways:</p>
<ul> <li>Using a regular expression literal, as follows: <div style="width: auto;"> <pre class="brush: js">var re = /ab+c/;
</pre> </div> <p>Regular expression literals provide compilation of the regular expression when the script is evaluated. When the regular expression will remain constant, use this for better performance.</p> </li> <li>Calling the constructor function of the <code><a href="/en/JavaScript/Reference/Global_Objects/RegExp" title="en/JavaScript/Reference/Global Objects/RegExp">RegExp</a></code> object, as follows: <div style="width: 100%;"> <pre class="brush: js">var re = new RegExp("ab+c");
</pre> </div> <p>Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.</p> </li>
</ul>
<h2>Writing a Regular Expression Pattern</h2>
<p>A regular expression pattern is composed of simple characters, such as <code>/abc/</code>, or a combination of simple and special characters, such as <code>/ab*c/</code> or <code>/Chapter (\d+)\.\d*/</code>. The last example includes parentheses which are used as a memory device. The match made with this part of the pattern is remembered for later use, as described in {{ web.link("#Using_Parenthesized_Substring_Matches", "Using Parenthesized Substring Matches") }}.</p>
<h3>Using Simple Patterns</h3>
<p>Simple patterns are constructed of characters for which you want to find a direct match. For example, the pattern <code>/abc/</code> matches character combinations in strings only when exactly the characters 'abc' occur together and in that order. Such a match would succeed in the strings "Hi, do you know your abc's?" and "The latest airplane designs evolved from slabcraft." In both cases the match is with the substring 'abc'. There is no match in the string "Grab crab" because it does not contain the substring 'abc'.</p>
<h3>Using Special Characters</h3>
<p>When the search for a match requires something more than a direct match, such as finding one or more b's, or finding white space, the pattern includes special characters. For example, the pattern <code>/ab*c/</code> matches any character combination in which a single 'a' is followed by zero or more 'b's (<code>*</code> means 0 or more occurrences of the preceding item) and then immediately followed by 'c'. In the string "cbbabbbbcdebc," the pattern matches the substring 'abbbbc'.</p>
<p>The following table provides a complete list and description of the special characters that can be used in regular expressions.</p>
<table class="fullwidth-table"> <caption style="text-align: left">Table 4.1 Special characters in regular expressions.</caption> <thead> <tr> <th scope="col">Character</th> <th scope="col">Meaning</th> </tr> </thead> <tbody> <tr> <td><code>\</code></td> <td>Either of the following: <ul> <li>For characters that are usually treated literally, indicates that the next character is special and not to be interpreted literally.</li> <li style="list-style-type: none;">For example, <code>/b/ </code> matches the character 'b'. By placing a backslash in front of b, that is by using <code>/\b/</code>, the character becomes special to mean match a word boundary.</li> <li>For characters that are usually treated specially, indicates that the next character is not special and should be interpreted literally.</li> <li style="list-style-type: none;">For example, <code>*</code> is a special character that means 0 or more occurrences of the preceding item should be matched; for example, <code>/a*/</code> means match 0 or more a's. To match <code>*</code> literally, precede it with a backslash; for example, <code>/a\*/</code> matches 'a*'.</li> <li style="list-style-type: none;">Also do not forget to escape \ itself while using the new RegExp("pattern") notation since \ is also an escape character in strings.</li> </ul> </td> </tr> <tr> <td><code>^</code></td> <td> <p>Matches beginning of input. If the multiline flag is set to true, also matches immediately after a line break character.</p> <p>For example, <code>/^A/</code> does not match the 'A' in "an A", but does match the 'A' in "An E".</p> </td> </tr> <tr> <td><code>$</code></td> <td> <p>Matches end of input. If the multiline flag is set to true, also matches immediately before a line break character.</p> <p>For example, <code>/t$/</code> does not match the 't' in "eater", but does match it in "eat".</p> </td> </tr> <tr> <td><code>*</code></td> <td> <p>Matches the preceding character 0 or more times.</p> <p>For example, <code>/bo*/</code> matches 'boooo' in "A ghost booooed" and 'b' in "A bird warbled", but nothing in "A goat grunted".</p> </td> </tr> <tr> <td><code>+</code></td> <td> <p>Matches the preceding character 1 or more times. Equivalent to {1,}.</p> <p>For example, <code>/a+/</code> matches the 'a' in "candy" and all the a's in "caaaaaaandy".</p> </td> </tr> <tr> <td><code>?</code></td> <td> <p>Matches the preceding character 0 or 1 time. Equivalent to {0,1}.</p> <p>For example, <code>/e?le?/</code> matches the 'el' in "angel" and the 'le' in "angle" and also the 'l' in "oslo".</p> <p>If used immediately after any of the quantifiers <code>*</code>, <code>+</code>, <code>?</code>, or <code>{}</code>, makes the quantifier non-greedy (matching the minimum number of times), as opposed to the default, which is greedy (matching the maximum number of times). For example, using /\d+/ non-global match "123abc" return "123", if using /\d+?/, only "1" will be matched.</p> <p>Also used in lookahead assertions, described under x(?=y) and x(?!y) in this table.</p> </td> </tr> <tr> <td><code>.</code></td> <td> <p>(The decimal point) matches any single character except the newline character.</p> <p>For example, <code>/.n/</code> matches 'an' and 'on' in "nay, an apple is on the tree", but not 'nay'.</p> </td> </tr> <tr> <td><code>(x)</code></td> <td> <p>Matches 'x' and remembers the match. These are called capturing parentheses.</p> <p>For example, <code>/(foo)/</code> matches and remembers 'foo' in "foo bar." The matched substring can be recalled from the resulting array's elements <code>[1]</code>, ..., <code>[n]</code>.</p> </td> </tr> <tr> <td><code>(?:x)</code></td> <td>Matches 'x' but does not remember the match. These are called non-capturing parentheses. The matched substring can not be recalled from the resulting array's elements <code>[1]</code>, ..., <code>[n]</code>.</td> </tr> <tr> <td><code>x(?=y)</code></td> <td> <p>Matches 'x' only if 'x' is followed by 'y'.</p> <p>For example, <code>/Jack(?=Sprat)/</code> matches 'Jack' only if it is followed by 'Sprat'. <code>/Jack(?=Sprat|Frost)/</code> matches 'Jack' only if it is followed by 'Sprat' or 'Frost'. However, neither 'Sprat' nor 'Frost' is part of the match results.</p> </td> </tr> <tr> <td><code>x(?!y)</code></td> <td> <p>Matches 'x' only if 'x' is not followed by 'y'.</p> <p>For example, <code>/\d+(?!\.)/</code> matches a number only if it is not followed by a decimal point. The regular expression <code>/\d+(?!\.)/.exec("3.141")</code> matches '141' but not '3.141'.</p> </td> </tr> <tr> <td><code>x|y</code></td> <td> <p>Matches either 'x' or 'y'.</p> <p>For example, <code>/green|red/</code> matches 'green' in "green apple" and 'red' in "red apple."</p> </td> </tr> <tr> <td><code>{n}</code></td> <td> <p>Where <code>n</code> is a positive integer. Matches exactly <code>n</code> occurrences of the preceding character.</p> <p>For example, <code>/a{2}/</code> doesn't match the 'a' in "candy," but it matches all of the a's in "caandy," and the first two a's in "caaandy."</p> </td> </tr> <tr> <td><code>{n,}</code></td> <td> <p>Where <code>n</code> is a positive integer. Matches at least <code>n</code> occurrences of the preceding character.</p> <p>For example, <code>/a{2,}/</code> doesn't match the 'a' in "candy", but matches all of the a's in "caandy" and in "caaaaaaandy."</p> </td> </tr> <tr> <td><code>{n,m}</code></td> <td> <p>Where <code>n</code> and <code>m</code> are positive integers. Matches at least <code>n</code> and at most <code>m</code> occurrences of the preceding character.</p> <p>For example, <code>/a{1,3}/</code> matches nothing in "cndy", the 'a' in "candy," the first two a's in "caandy," and the first three a's in "caaaaaaandy" Notice that when matching "caaaaaaandy", the match is "aaa", even though the original string had more a's in it.</p> </td> </tr> <tr> <td><code>[xyz]</code></td> <td> <p>A character set. Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen.</p> <p>For example, <code>[abcd]</code> is the same as <span style="font-family: monospace;">[</span><code>a-d]</code>. They match the 'b' in "brisket" and the 'c' in "city".</p> </td> </tr> <tr> <td><code>[^xyz]</code></td> <td> <p>A negated or complemented character set. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hyphen.</p> <p>For example, <code>[^abc]</code> is the same as <code>[^a-c]</code>. They initially match 'r' in "brisket" and 'h' in "chop."</p> </td> </tr> <tr> <td><code>[\b]</code></td> <td>Matches a backspace. (Not to be confused with <code>\b</code>.)</td> </tr> <tr> <td><code>\b</code></td> <td> <p>Matches a word boundary, such as a space, a newline character, punctuation character or end of string. (Not to be confused with <code>[\b]</code>.)</p> <p>For example, <code>/\bn\w/</code> matches the 'no' in "noonday";<code>/\wy\b/</code> matches the 'ly' in "possibly, yesterday". Note that the "," is not included in the match.</p> </td> </tr> <tr> <td><code>\B</code></td> <td> <p>Matches a non-word boundary.</p> <p>For example, <code>/\w\Bn/</code> matches 'on' in "noonday", and <code>/y\B\w/</code> matches 'ye' in "possibly yesterday."</p> </td> </tr> <tr> <td><code>\c<em>X</em></code></td> <td> <p>Where <em>X</em> is a control character. Matches a control character in a string.</p> <p>For example, <code>/\cM/</code> matches control-M in a string.</p> </td> </tr> <tr> <td><code>\d</code></td> <td> <p>Matches a digit character. Equivalent to <code>[0-9]</code>.</p> <p>For example, <code>/\d/</code> or <code>/[0-9]/</code> matches '2' in "B2 is the suite number."</p> </td> </tr> <tr> <td><code>\D</code></td> <td> <p>Matches any non-digit character. Equivalent to <code>[^0-9]</code>.</p> <p>For example, <code>/\D/</code> or <code>/[^0-9]/</code> matches 'B' in "B2 is the suite number."</p> </td> </tr> <tr> <td><code>\f</code></td> <td>Matches a form-feed.</td> </tr> <tr> <td><code>\n</code></td> <td>Matches a linefeed.</td> </tr> <tr> <td><code>\r</code></td> <td>Matches a carriage return.</td> </tr> <tr> <td><code>\s</code></td> <td> <p>Matches a single white space character, including space, tab, form feed, line feed. Equivalent to <code>[ \f\n\r\t\v​\u00A0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​\u2028\u2029​\u202f\u205f​\u3000]</code>.</p> <p>For example, <code>/\s\w*/</code> matches ' bar' in "foo bar."</p> </td> </tr> <tr> <td><code>\S</code></td> <td> <p>Matches a single character other than white space. Equivalent to <code>[^ \f\n\r\t\v​\u00A0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​\u2028\u2029​\u202f\u205f​\u3000]</code>.</p> <p>For example, <code>/\S\w*/</code> matches 'foo' in "foo bar."</p> </td> </tr> <tr> <td><code>\t</code></td> <td>Matches a tab.</td> </tr> <tr> <td><code>\v</code></td> <td>Matches a vertical tab.</td> </tr> <tr> <td><code>\w</code></td> <td> <p>Matches any alphanumeric character including the underscore. Equivalent to <code>[A-Za-z0-9_]</code>.</p> <p>For example, <code>/\w/</code> matches 'a' in "apple," '5' in "$5.28," and '3' in "3D."</p> </td> </tr> <tr> <td><code>\W</code></td> <td> <p>Matches any non-word character. Equivalent to <code>[^A-Za-z0-9_]</code>.</p> <p>For example, <code>/\W/</code> or <code>/[^A-Za-z0-9_]/</code> matches '%' in "50%."</p> </td> </tr> <tr> <td><code>\<em>n</em></code></td> <td> <p>Where <em>n</em> is a positive integer. A back reference to the last substring matching the <em>n</em> parenthetical in the regular expression (counting left parentheses).</p> <p>For example, <code>/apple(,)\sorange\1/</code> matches 'apple, orange,' in "apple, orange, cherry, peach."</p> </td> </tr> <tr> <td><code>\0</code></td> <td>Matches a NUL character. Do not follow this with another digit.</td> </tr> <tr> <td><code>\xhh</code></td> <td>Matches the character with the code hh (two hexadecimal digits)</td> </tr> <tr> <td><code>\uhhhh</code></td> <td>Matches the character with the code hhhh (four hexadecimal digits).</td> </tr> </tbody>
</table>
<h3>Using Parentheses</h3>
<p>Parentheses around any part of the regular expression pattern cause that part of the matched substring to be remembered. Once remembered, the substring can be recalled for other use, as described in {{ web.link("#Using_Parenthesized_Substring_Matches", "Using Parenthesized Substring Matches") }}.</p>
<p>For example, the pattern <code>/Chapter (\d+)\.\d*/</code> illustrates additional escaped and special characters and indicates that part of the pattern should be remembered. It matches precisely the characters 'Chapter ' followed by one or more numeric characters (<code>\d</code> means any numeric character and <code>+</code> means 1 or more times), followed by a decimal point (which in itself is a special character; preceding the decimal point with \ means the pattern must look for the literal character '.'), followed by any numeric character 0 or more times (<code>\d</code> means numeric character, <code>*</code> means 0 or more times). In addition, parentheses are used to remember the first matched numeric characters.</p>
<p>This pattern is found in "Open Chapter 4.3, paragraph 6" and '4' is remembered. The pattern is not found in "Chapter 3 and 4", because that string does not have a period after the '3'.</p>
<p>To match a substring without causing the matched part to be remembered, within the parentheses preface the pattern with <code>?:</code>. For example, <code>(?:\d+)</code> matches one or more numeric characters but does not remember the matched characters.</p>
<h2>Working with Regular Expressions</h2>
<p>Regular expressions are used with the <code>RegExp</code> methods <code>test</code> and <code>exec</code> and with the <code>String</code> methods <code>match</code>, <code>replace</code>, <code>search</code>, and <code>split</code>. These methods are explained in detail in the <a href="/en/JavaScript/Reference" title="en/JavaScript/Reference">JavaScript Reference</a>.</p>
<table class="standard-table"> <caption style="text-align: left">Table 4.2 Methods that use regular expressions</caption> <thead> <tr> <th scope="col">Method</th> <th scope="col">Description</th> </tr> </thead> <tbody> <tr> <td><code><a href="/en/JavaScript/Reference/Global_Objects/RegExp/exec" title="en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp/exec">exec</a></code></td> <td>A <code>RegExp</code> method that executes a search for a match in a string. It returns an array of information.</td> </tr> <tr> <td><code><a href="/en/JavaScript/Reference/Global_Objects/RegExp/test" title="en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp/test">test</a></code></td> <td>A <code>RegExp</code> method that tests for a match in a string. It returns true or false.</td> </tr> <tr> <td><code><a href="/en/JavaScript/Reference/Global_Objects/String/match" title="en/Core_JavaScript_1.5_Reference/Global_Objects/String/match">match</a></code></td> <td>A <code>String</code> method that executes a search for a match in a string. It returns an array of information or null on a mismatch.</td> </tr> <tr> <td><code><a href="/en/JavaScript/Reference/Global_Objects/String/search" title="en/Core_JavaScript_1.5_Reference/Global_Objects/String/search">search</a></code></td> <td>A <code>String</code> method that tests for a match in a string. It returns the index of the match, or -1 if the search fails.</td> </tr> <tr> <td><code><a href="/en/JavaScript/Reference/Global_Objects/String/replace" title="en/Core_JavaScript_1.5_Reference/Global_Objects/String/replace">replace</a></code></td> <td>A <code>String</code> method that executes a search for a match in a string, and replaces the matched substring with a replacement substring.</td> </tr> <tr> <td><code><a href="/en/JavaScript/Reference/Global_Objects/String/split" title="en/Core_JavaScript_1.5_Reference/Global_Objects/String/split">split</a></code></td> <td>A <code>String</code> method that uses a regular expression or a fixed string to break a string into an array of substrings.</td> </tr> </tbody>
</table>
<p>When you want to know whether a pattern is found in a string, use the <code>test</code> or <code>search</code> method; for more information (but slower execution) use the <code>exec</code> or <code>match</code> methods. If you use <code>exec</code> or <code>match</code> and if the match succeeds, these methods return an array and update properties of the associated regular expression object and also of the predefined regular expression object, <code>RegExp</code>. If the match fails, the <code>exec</code> method returns <code>null</code> (which converts to <code>false</code>).</p>
<p>In the following example, the script uses the <code>exec</code> method to find a match in a string.</p>
<pre class="brush: js">var myRe = /d(b+)d/g;
var myArray = myRe.exec("cdbbdbsbz");
</pre>
<p>If you do not need to access the properties of the regular expression, an alternative way of creating <code>myArray</code> is with this script:</p>
<pre class="brush: js">var myArray = /d(b+)d/g.exec("cdbbdbsbz");
</pre>
<p>If you want to construct the regular expression from a string, yet another alternative is this script:</p>
<pre class="brush: js">var myRe = new RegExp("d(b+)d", "g");
var myArray = myRe.exec("cdbbdbsbz");
</pre>
<p>With these scripts, the match succeeds and returns the array and updates the properties shown in the following table.</p>
<table class="fullwidth-table"> <caption style="text-align: left">Table 4.3 Results of regular expression execution.</caption> <thead> <tr> <th scope="col">Object</th> <th scope="col">Property or index</th> <th scope="col">Description</th> <th scope="col">In this example</th> </tr> </thead> <tbody> <tr> <td rowspan="4"><code>myArray</code></td> <td> </td> <td>The matched string and all remembered substrings.</td> <td><code>["dbbd", "bb"]</code></td> </tr> <tr> <td><code>index</code></td> <td>The 0-based index of the match in the input string.</td> <td><code>1</code></td> </tr> <tr> <td><code>input</code></td> <td>The original string.</td> <td><code>"cdbbdbsbz"</code></td> </tr> <tr> <td><code>[0]</code></td> <td>The last matched characters.</td> <td><code>"dbbd"</code></td> </tr> <tr> <td rowspan="2"><code>myRe</code></td> <td><code>lastIndex</code></td> <td>The index at which to start the next match. (This property is set only if the regular expression uses the g option, described in {{ web.link("#Advanced_Searching_With_Flags", "Advanced Searching With Flags") }}.)</td> <td><code>5</code></td> </tr> <tr> <td><code>source</code></td> <td>The text of the pattern. Updated at the time that the regular expression is created, not executed.</td> <td><code>"d(b+)d"</code></td> </tr> </tbody>
</table>
<p>As shown in the second form of this example, you can use a regular expression created with an object initializer without assigning it to a variable. If you do, however, every occurrence is a new regular expression. For this reason, if you use this form without assigning it to a variable, you cannot subsequently access the properties of that regular expression. For example, assume you have this script:</p>
<pre class="brush: js">var myRe = /d(b+)d/g;
var myArray = myRe.exec("cdbbdbsbz");
console.log("The value of lastIndex is " + myRe.lastIndex);
</pre>
<p>This script displays:</p>
<pre>The value of lastIndex is 5
</pre>
<p>However, if you have this script:</p>
<pre class="brush: js">var myArray = /d(b+)d/g.exec("cdbbdbsbz");
console.log("The value of lastIndex is " + /d(b+)d/g.lastIndex);
</pre>
<p>It displays:</p>
<pre>The value of lastIndex is 0
</pre>
<p>The occurrences of <code>/d(b+)d/g</code> in the two statements are different regular expression objects and hence have different values for their <code>lastIndex</code> property. If you need to access the properties of a regular expression created with an object initializer, you should first assign it to a variable.</p>
<h3>Using Parenthesized Substring Matches</h3>
<p>Including parentheses in a regular expression pattern causes the corresponding submatch to be remembered. For example, <code>/a(b)c/</code> matches the characters 'abc' and remembers 'b'. To recall these parenthesized substring matches, use the <code>Array</code> elements <code>[1]</code>, ..., <code>[n]</code>.</p>
<p>The number of possible parenthesized substrings is unlimited. The returned array holds all that were found. The following examples illustrate how to use parenthesized substring matches.</p>
<h4>Example 1</h4>
<p>The following script uses the <a href="/en/JavaScript/Reference/Global_Objects/String/replace" title="en/JavaScript/Reference/Global Objects/String/replace"><code>replace()</code></a> method to switch the words in the string. For the replacement text, the script uses the <code>$1</code> and <code>$2</code> in the replacement to denote the first and second parenthesized substring matches.</p>
<pre class="brush: js">var re = /(\w+)\s(\w+)/;
var str = "John Smith";
var newstr = str.replace(re, "$2, $1");
console.log(newstr);
</pre>
<p>This prints "Smith, John".</p>
<h3>Advanced Searching With Flags</h3>
<p>Regular expressions have four optional flags that allow for global and case insensitive searching. To indicate a global search, use the <code>g</code> flag. To indicate a case-insensitive search, use the <code>i</code> flag. To indicate a multi-line search, use the <code>m</code> flag. To perform a "sticky" search, that matches starting at the current position in the target string, use the <code>y</code> flag. These flags can be used separately or together in any order, and are included as part of the regular expression.</p>
<p>{{ Fx_minversion_note("3") }}</p>
<p>To include a flag with the regular expression, use this syntax:</p>
<pre class="brush: js">var re = /pattern/flags;
</pre>
<p>or</p>
<pre class="brush: js">var re = new RegExp("pattern", "flags");
</pre>
<p>Note that the flags are an integral part of a regular expression. They cannot be added or removed later.</p>
<p>For example, <code>re = /\w+\s/g</code> creates a regular expression that looks for one or more characters followed by a space, and it looks for this combination throughout the string.</p>
<pre class="brush: js">var re = /\w+\s/g;
var str = "fee fi fo fum";
var myArray = str.match(re);
console.log(myArray);
</pre>
<p>This displays ["fee ", "fi ", "fo "]. In this example, you could replace the line:</p>
<pre class="brush: js">var re = /\w+\s/g;
</pre>
<p>with:</p>
<pre class="brush: js">var re = new RegExp("\\w+\\s", "g");
</pre>
<p>and get the same result.</p>
<p>The <code>m</code> flag is used to specify that a multiline input string should be treated as multiple lines. If the <code>m</code> flag is used, <code>^</code> and <code>$</code> match at the start or end of any line within the input string instead of the start or end of the entire string.</p>
<h2>Examples</h2>
<p>The following examples show some uses of regular expressions.</p>
<h3>Changing the Order in an Input String</h3>
<p>The following example illustrates the formation of regular expressions and the use of <code>string.split()</code> and <code>string.replace()</code>. It cleans a roughly formatted input string containing names (first name first) separated by blanks, tabs and exactly one semicolon. Finally, it reverses the name order (last name first) and sorts the list.</p>
<pre class="brush: js">// The name string contains multiple spaces and tabs,
// and may have multiple spaces between first and last names.
var names = "Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ; Chris Hand ";

var output = ["---------- Original String\n", names + "\n"];

// Prepare two regular expression patterns and array storage.
// Split the string into array elements.

// pattern: possible white space then semicolon then possible white space
var pattern = /\s*;\s*/;

// Break the string into pieces separated by the pattern above and
// store the pieces in an array called nameList
var nameList = names.split(pattern);

// new pattern: one or more characters then spaces then characters.
// Use parentheses to "memorize" portions of the pattern.
// The memorized portions are referred to later.
pattern = /(\w+)\s+(\w+)/;

// New array for holding names being processed.
var bySurnameList = [];

// Display the name array and populate the new array
// with comma-separated names, last first.
//
// The replace method removes anything matching the pattern
// and replaces it with the memorized string—second memorized portion
// followed by comma space followed by first memorized portion.
//
// The variables $1 and $2 refer to the portions
// memorized while matching the pattern.

output.push("---------- After Split by Regular Expression");

var i, len;
for (i = 0, len = nameList.length; i &lt; len; i++){
  output.push(nameList[i]);
  bySurnameList[i] = nameList[i].replace(pattern, "$2, $1");
}

// Display the new array.
output.push("---------- Names Reversed");
for (i = 0, len = bySurnameList.length; i &lt; len; i++){
  output.push(bySurnameList[i]);
}

// Sort by last name, then display the sorted array.
bySurnameList.sort();
output.push("---------- Sorted");
for (i = 0, len = bySurnameList.length; i &lt; len; i++){
  output.push(bySurnameList[i]);
}

output.push("---------- End");

console.log(output.join("\n"));
</pre>
<h3>Using Special Characters to Verify Input</h3>
<p>In the following example, the user is expected to enter a phone number. When the user presses the "Check" button, the script checks the validity of the number. If the number is valid (matches the character sequence specified by the regular expression), the script shows a message thanking the user and confirming the number. If the number is invalid, the script informs the user that the phone number is not valid at all.</p>
<p>The regular expression looks for zero or one open parenthesis <code>\(?</code>, followed by three digits<code> \d{3}</code>, followed by zero or one close parenthesis <code>\)?</code>, followed by one dash, forward slash, or decimal point and when found, remember the character <code>([-\/\.])</code>, followed by three digits <code>\d{3}</code>, followed by the remembered match of a dash, forward slash, or decimal point <code>\1</code>, followed by four digits <code>\d{4}</code>.</p>
<p>The <code>Change</code> event activated when the user presses Enter sets the value of <code>RegExp.input</code>.</p>
<pre class="brush: html">&lt;!DOCTYPE html&gt;
&lt;html&gt;  
  &lt;head&gt;  
    &lt;meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"&gt;  
    &lt;meta http-equiv="Content-Script-Type" content="text/javascript"&gt;  
    &lt;script type="text/javascript"&gt;  
      var re = /\(?\d{3}\)?([-\/\.])\d{3}\1\d{4}/;  
      function testInfo(phoneInput){  
        var OK = re.exec(phoneInput.value);  
        if (!OK)  
          window.alert(RegExp.input + " isn't a phone number with area code!");  
        else
          window.alert("Thanks, your phone number is " + OK[0]);  
      }  
    &lt;/script&gt;  
  &lt;/head&gt;  
  &lt;body&gt;  
    &lt;p&gt;Enter your phone number (with area code) and then click "Check".
        &lt;br&gt;The expected format is like ###-###-####.&lt;/p&gt;
    &lt;form action="#"&gt;  
      &lt;input id="phone"&gt;&lt;button onclick="testInfo(document.getElementById('phone'));"&gt;Check&lt;/button&gt;
    &lt;/form&gt;  
  &lt;/body&gt;  
&lt;/html&gt;
</pre>
<pre class="script" style="font-size: 16px;">autoPreviousNext("JSGChapters");
wiki.languages({
  "fr": "fr/Guide_JavaScript_1.5/Expressions_rationnelles",
  "ja": "ja/Core_JavaScript_1.5_Guide/Regular_Expressions"
});
</pre>
Revert to this revision