Intl.Segmenter() constructor

The Intl.Segmenter() constructor creates Intl.Segmenter objects that enable locale-sensitive text segmentation.

Try it

Syntax

new Intl.Segmenter()
new Intl.Segmenter(locales)
new Intl.Segmenter(locales, options)

Note: Intl.Segmenter() can only be constructed with new. Attempting to call it without new throws a TypeError.

Parameters

locales Optional

A string with a BCP 47 language tag, or an array of such strings. For the general form and interpretation of the locales argument, see Locale identification and negotiation.

options Optional

An object with some or all of the following properties:

granularity Optional

A string. Possible values are:

"grapheme" (default)

Split the input into segments at grapheme cluster (user-perceived character) boundaries, as determined by the locale.

"word"

Split the input into segments at word boundaries, as determined by the locale.

"sentence"

Split the input into segments at sentence boundaries, as determined by the locale.

localeMatcher Optional

The locale matching algorithm to use. Possible values are:

"best fit" (default)

The runtime may choose a possibly more suited locale than the result of the lookup algorithm.

"lookup"

Use the BCP 47 Lookup algorithm to choose the locale from locales. For each locale in locales, the runtime returns the first supported locale (possibly removing restricting subtags of the provided locale tag to find such a supported locale. In other words providing "de-CH" as locales may result in using "de" if "de" is supported but "de-CH" is not).

Return value

A new Intl.Segments instance.

Examples

Basic usage

The following example shows how to count words in a string using the Japanese language (where splitting the string using String methods would have given an incorrect result).

const text = "吾輩は猫である。名前はたぬき。";
const japaneseSegmenter = new Intl.Segmenter("ja-JP", {granularity: "word"});
console.log([...japaneseSegmenter.segment(text)].filter((segment) => segment.isWordLike).length);
// 8, as the text is segmented as '吾輩'|'は'|'猫'|'で'|'ある'|'。'|'名前'|'は'|'たぬき'|'。'

Specifications

Specification
ECMAScript Internationalization API Specification
# sec-intl-segmenter-constructor

Browser compatibility

BCD tables only load in the browser