Regular Expression Syntax
Jump to navigation
Jump to search
Navigation: User Guide ➔ Reports ➔ Excel Reports ➔ Get Tag Reports ➔ Excel Tag Select Reports ➔ Regular Expression Syntax
Using Regular Expressions
A regular expression is a pattern of text that consists of ordinary characters (for example, letters a through z) and special characters, known as metacharacters.
- The pattern describes one or more strings to match when searching a body of text.
- The regular expression serves as a template for matching a character pattern to the string being searched.
- For further information see Visual Basic Scripting Edition © Microsoft Corporation.
Here are some examples of regular expression you might encounter:
- .*P.* Match a string containing the letter "P"
- TNK.* Match a string starting with "TNK"
The following table contains the complete list of metacharacters and their behavior in the context of regular expressions:
| metacharacters | Description | Example |
|---|---|---|
| \ | Marks the next character as either a special character, a literal, a backreference, or an octal escape. | For example,
|
| ^ | Matches the position at the beginning of the input string. | If the RegExp object's Multiline property is set, ^ also matches the position following '\n' or '\r'. |
| $ | Matches the position at the end of the input string. | If the RegExp object's Multiline property is set, $ also matches the position preceding '\n' or '\r'. |
| * | Matches the preceding character or subexpression zero or more times. | For example, zo* matches "z" and "zoo". * is equivalent to {0,}. |
| + | Matches the preceding character or subexpression one or more times. | For example, 'zo+' matches "zo" and "zoo", but not "z". + is equivalent to {1,}. |
| ? | Matches the preceding character or subexpression zero or one time. | For example, "do(es)?" matches the "do" in "do" or "does". ? is equivalent to {0,1} |
| {n} | n is a nonnegative integer. Matches exactly n times. | For example,
|
| {n,} | n is a nonnegative integer. Matches at least n times. | For example,
|
| {n,m} | m and n are nonnegative integers, where n <= m. Matches at least n and at most m times. | For example,
|
| ? | When this character immediately follows any of the other quantifiers (*, +, ?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. A non-greedy pattern matches as little of the searched string as possible, whereas the default greedy pattern matches as much of the searched string as possible. | For example,
|
| . | Matches any single character except "\n". | To match any character including the '\n', use a pattern such as '[\s\S]'. |
| (pattern) | Matches pattern and captures the match. | The captured match can be retrieved from the resulting Matches collection, using the SubMatches collection in VBScript or the $0…$9 properties in JScript. To match parentheses characters ( ), use '\(' or '\)'. |
| (?:pattern) | Matches pattern but does not capture the match, that is, it is a non-capturing match that is not stored for possible later use. This is useful for combining parts of a pattern with the "or" character (|). | For example, 'industr(?:y|ies) is a more economical expression than 'industry|industries'. |
| (?=pattern) | Positive lookahead matches the search string at any point where a string matching pattern begins. This is a non-capturing match, that is, the match is not captured for possible later use. | For example 'Windows (?=95|98|NT|2000)' matches "Windows" in "Windows 2000" but not "Windows" in "Windows 3.1". Lookaheads do not consume characters, that is, after a match occurs, the search for the next match begins immediately following the last match, not after the characters that comprised the lookahead. |
| (?!pattern) | Negative lookahead matches the search string at any point where a string not matching pattern begins. This is a non-capturing match, that is, the match is not captured for possible later use. | For example 'Windows (?!95|98|NT|2000)' matches "Windows" in "Windows 3.1" but does not match "Windows" in "Windows 2000". Lookaheads do not consume characters, that is, after a match occurs, the search for the next match begins immediately following the last match, not after the characters that comprised the lookahead. |
| x|y | Matches either x or y. | For example, 'z|food' matches "z" or "food". '(z|f)ood' matches "zood" or "food". |
| [xyz] | A character set. Matches any one of the enclosed characters. | For example, '[abc]' matches the 'a' in "plain". |
| [^xyz] | A negative character set. Matches any character not enclosed. | For example, '[^abc]' matches the 'p' in "plain". |
| [a-z] | A range of characters. Matches any character in the specified range. | For example, '[a-z]' matches any lowercase alphabetic character in the range 'a' through 'z'. |
| [^a-z] | A negative range characters. Matches any character not in the specified range. | For example, '[^a-z]' matches any character not in the range 'a' through 'z'. |
| \b | Matches a word boundary, that is, the position between a word and a space. | For example, 'er\b' matches the 'er' in "never" but not the 'er' in "verb". |
| \B | Matches a nonword boundary. | 'er\B' matches the 'er' in "verb" but not the 'er' in "never". |
| \cx | Matches the control character indicated by x. | For example, \cM matches a Control-M or carriage return character. The value of x must be in the range of A-Z or a-z. If not, c is assumed to be a literal 'c' character. |
| \d | Matches a digit character. | Equivalent to [0-9]. |
| \D | Matches a nondigit character. | Equivalent to [^0-9]. |
| \f | Matches a form-feed character. | Equivalent to \x0c and \cL. |
| \n | Matches a newline character. | Equivalent to \x0a and \cJ. |
| \r | Matches a carriage return character. | Equivalent to \x0d and \cM. |
| \s | Matches any whitespace character including space, tab, form-feed, etc. | Equivalent to [ \f\n\r\t\v]. |
| \S | Matches any non-white space character. | Equivalent to [^ \f\n\r\t\v]. |
| \t | Matches a tab character. | Equivalent to \x09 and \cI. |
| \v | Matches a vertical tab character. | Equivalent to \x0b and \cK. |
| \w | Matches any word character including underscore. | Equivalent to '[A-Za-z0-9_]'. |
| \W | Matches any nonword character. | Equivalent to '[^A-Za-z0-9_]'. |
| \xn | Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long. | For example, '\x41' matches "A". '\x041' is equivalent to '\x04' & "1". Allows ASCII codes to be used in regular expressions. |
| \num | Matches num, where num is a positive integer. A reference back to captured matches. | For example, '(.)\1' matches two consecutive identical characters. |
| \n | Identifies either an octal escape value or a backreference. | If \n is preceded by at least n captured subexpressions, n is a backreference. Otherwise, n is an octal escape value if n is an octal digit (0-7). |
| \nm | Identifies either an octal escape value or a backreference. |
|
| \nml | Matches octal escape value nml when n is an octal digit (0-3) and m and l are octal digits (0-7). | |
| \un | Matches n, where n is a Unicode character expressed as four hexadecimal digits. | For example, \u00A9 matches the copyright symbol (©). |