New reference pages on MDN for JavaScript regular expressions
Our JavaScript regular expressions (regex) documentation is one of the most popular resources on MDN Web Docs. Thanks to the efforts of Joshua Chen, we now have dedicated pages for each feature with more comprehensive information about the syntax and semantics, with browser compatibility information included. Let's take a look at the new pages, how the information is organized, and how the new documentation can help you write regular expressions in JavaScript.
JavaScript regular expressions guide
Before this initiative, we started off with a regular expressions guide that explains what regular expressions are and how to use them in JavaScript. The guide has a few sub-pages that split out content into the following sections:
These guides and the cheat sheet are great if you are new to regex or you need a refresher on the basic concepts. What we were missing were accelerators for readers who are already familiar with the basics and want to quickly look up the specific details of a language feature.
The first step to refresh this topic and add new documentation was a GitHub discussion to gather feedback, followed by an initial pull request by Josh with suggested changes and a scaffold for the new pages. After a few rounds of revisions, we're happy to have the changes land in MDN's content repository and the new pages are now live.
Regular expressions reference pages
The regular expressions reference is the entry point for all of the new documentation. There are 18 new pages for individual features, and existing content with deprecated information has been removed. The new references are organized into the following sections and categories:
- Creating regular expressions - how to create a regular expression in JavaScript
- Flags - the flags that change how a regex is interpreted
- Assertions - if a pattern matches a condition, such as the start of a string or a word boundary
- Atoms - the units that make up a pattern, such as character classes and literal characters
- Other features - features like quantifiers that help you compose patterns
The content organized under atoms is where most of the new docs are located, and it's likely where you'll spend most of your time if you're looking to compose a regex. With the information organized into sections like this, there's now a clear path to find the information you need or gain a deeper understanding of how features work.
Having a reference for each feature means it appears more prominently in search results and is easier to locate. Additionally, each page is included in the sidebar navigation, so if you've landed on the documentation for a feature but meant to look up something else, you should be able to find it quickly.
Highlights from the new pages
The new pages are comprehensive in terms of syntax and semantics, with a description section that hopefully will help you understand when there are gotchas or caveats to be aware of. Aside from the syntax and semantics, it's always helpful to see examples of using a feature. Let's look at some of the examples that are my favorites in the following few sections.
Capturing groups and named capturing groups
The capturing groups reference has a nice example that shows how to match a date in the format YYYY-MM-DD
and extract the year, month, and day.
This is useful because it's a common operation that you might need to do without the overhead of a full date parsing library:
function parseDate(input) {
const parts = /^(\d{4})-(\d{2})-(\d{2})$/.exec(input);
if (!parts) {
return null;
}
return parts.slice(1).map((p) => parseInt(p, 10));
}
parseDate("2019-01-01"); // [2019, 1, 1]
parseDate("2019-06-19"); // [2019, 6, 19]
One of the most common complaints about regular expressions is how hard they are to read. If you're using capturing groups, you can use named capturing groups to make your patterns a bit easier to understand:
function parseLog(entry) {
const { author, timestamp } = /^(?<timestamp>\d+),(?<author>.+)$/.exec(
entry,
).groups;
return `${author} committed on ${new Date(
parseInt(timestamp) * 1000,
).toLocaleString()}`;
}
parseLog("1560979912,Caroline"); // "Caroline committed on 6/19/2019, 5:31:52 PM"
This one is nice because it shows how to parse a log entry, extract the timestamp and author, and format the result into something more readable. Someone reading the code for the first time will have an easier time understanding what the regex is doing.
Unicode character class escape
The Unicode character class escape page has a great example that illustrates how to detect if a string contains a character from different scripts. This can be useful if you are trying to detect strings in different languages without having to manually or exhaustively specify certain characters (or ranges of characters) in your pattern:
const mixedCharacters = "aεЛ";
// Using the canonical "long" name of the script
mixedCharacters.match(/\p{Script=Latin}/u); // a
// Using a short alias (ISO 15924 code) for the script
mixedCharacters.match(/\p{Script=Grek}/u); // ε
// Using the short name sc for the Script property
mixedCharacters.match(/\p{sc=Cyrillic}/u); // Л
Character classes
The regular expressions character classes might be considered one of the most basic features, but there is an example on the reference page showing how they can be powerful in certain use cases.
The example shows how to match hexadecimal digits, and I like this because it illustrates how ranges work in combination with the i
flag to match without case sensitivity:
function isHexadecimal(str) {
return /^[0-9A-F]+$/i.test(str);
}
isHexadecimal("2F3"); // true
isHexadecimal("beef"); // true
isHexadecimal("undefined"); // false
Browser compatibility information
Each of the regular expressions reference pages have browser compatibility data for the corresponding feature. This means that the browser support information is now more granular and you can see the browser versions that support or don't support a feature.
What's next?
The regular expressions references will help more advanced users find the information they need quickly and also help beginners learn more about regular expressions.
These references will also greatly help with documenting the new v
mode that is currently in development and is expected to add support for set difference/subtraction, set intersection, and nested character classes.
Having individual reference pages for each regular expression feature will make it easier to give a more detailed explanation of the new features using the v
flag.
For more information on this flag, check out the TC39 proposal.
Summary
We've covered some of the highlights of the new regular expressions reference pages and I hope you find them useful. We're happy to have this new content with the significant improvements made by Josh and the rest of the MDN Web Docs team. If you want to dive in to the new pages, be sure to check out these landing pages for the references and the guide:
If you enjoyed this article or if you have any feedback, feel free to join the discussion in the MDN Web Docs Discord server or leave a comment on the GitHub discussion.
Happy pattern matching!