String
The String
object is used to represent and manipulate a
sequence of characters.
Description
Strings are useful for holding data that can be represented in text form. Some of the
most-used operations on strings are to check their length
, to build and concatenate them using the
+ and += string operators,
checking for the existence or location of substrings with the
indexOf()
method, or extracting substrings
with the substring()
method.
Creating strings
Strings can be created as primitives, from string literals, or as objects, using the
String()
constructor:
const string1 = "A string primitive";
const string2 = 'Also a string primitive';
const string3 = `Yet another string primitive`;
const string4 = new String("A String object");
String primitives and string objects share many behaviors, but have other important differences and caveats. See "String primitives and String objects" below.
String literals can be specified using single or double quotes, which are treated identically, or using the backtick character `. This last form specifies a template literal: with this form you can interpolate expressions.
Character access
There are two ways to access an individual character in a string. The first is the
charAt()
method:
"cat".charAt(1); // gives value "a"
The other way is to treat the string as an array-like object, where individual characters correspond to a numerical index:
"cat"[1]; // gives value "a"
When using bracket notation for character access, attempting to delete or assign a
value to these properties will not succeed. The properties involved are neither writable
nor configurable. (See Object.defineProperty()
for more information.)
Comparing strings
Use the less-than and greater-than operators to compare strings:
const a = "a";
const b = "b";
if (a < b) {
// true
console.log(`${a} is less than ${b}`);
} else if (a > b) {
console.log(`${a} is greater than ${b}`);
} else {
console.log(`${a} and ${b} are equal.`);
}
Note that all comparison operators, including ===
and ==
, compare strings case-sensitively. A common way to compare strings case-insensitively is to convert both to the same case (upper or lower) before comparing them.
function areEqualCaseInsensitive(str1, str2) {
return str1.toUpperCase() === str2.toUpperCase();
}
The choice of whether to transform by toUpperCase()
or toLowerCase()
is mostly arbitrary, and neither one is fully robust when extending beyond the Latin alphabet. For example, the German lowercase letter ß
and ss
are both transformed to SS
by toUpperCase()
, while the Turkish letter ı
would be falsely reported as unequal to I
by toLowerCase()
unless specifically using toLocaleLowerCase("tr")
.
const areEqualInUpperCase = (str1, str2) =>
str1.toUpperCase() === str2.toUpperCase();
const areEqualInLowerCase = (str1, str2) =>
str1.toLowerCase() === str2.toLowerCase();
areEqualInUpperCase("ß", "ss"); // true; should be false
areEqualInLowerCase("ı", "I"); // false; should be true
A locale-aware and robust solution for testing case-insensitive equality is to use the Intl.Collator
API or the string's localeCompare()
method — they share the same interface — with the sensitivity
option set to "accent"
or "base"
.
const areEqual = (str1, str2, locale = "en-US") =>
str1.localeCompare(str2, locale, { sensitivity: "accent" }) === 0;
areEqual("ß", "ss", "de"); // false
areEqual("ı", "I", "tr"); // true
The localeCompare()
method enables string comparison in a similar fashion as strcmp()
— it allows sorting strings in a locale-aware manner.
String primitives and String objects
Note that JavaScript distinguishes between String
objects and
primitive string values. (The same is true of
Boolean
and Numbers
.)
String literals (denoted by double or single quotes) and strings returned from
String
calls in a non-constructor context (that is, called without using
the new
keyword) are primitive strings. In contexts where a
method is to be invoked on a primitive string or a property lookup occurs, JavaScript
will automatically wrap the string primitive and call the method or perform the property
lookup on the wrapper object instead.
const strPrim = "foo"; // A literal is a string primitive
const strPrim2 = String(1); // Coerced into the string primitive "1"
const strPrim3 = String(true); // Coerced into the string primitive "true"
const strObj = new String(strPrim); // String with new returns a string wrapper object.
console.log(typeof strPrim); // "string"
console.log(typeof strPrim2); // "string"
console.log(typeof strPrim3); // "string"
console.log(typeof strObj); // "object"
Warning: You should rarely find yourself using String
as a constructor.
String primitives and String
objects also give different results when
using eval()
. Primitives passed to
eval
are treated as source code; String
objects are treated as
all other objects are, by returning the object. For example:
const s1 = "2 + 2"; // creates a string primitive
const s2 = new String("2 + 2"); // creates a String object
console.log(eval(s1)); // returns the number 4
console.log(eval(s2)); // returns the string "2 + 2"
For these reasons, the code may break when it encounters String
objects
when it expects a primitive string instead, although generally, authors need not worry
about the distinction.
A String
object can always be converted to its primitive counterpart with
the valueOf()
method.
console.log(eval(s2.valueOf())); // returns the number 4
String coercion
Many built-in operations that expect strings first coerce their arguments to strings (which is largely why String
objects behave similarly to string primitives). The operation can be summarized as follows:
- Strings are returned as-is.
undefined
turns into"undefined"
.null
turns into"null"
.true
turns into"true"
;false
turns into"false"
.- Numbers are converted with the same algorithm as
toString(10)
. - BigInts are converted with the same algorithm as
toString(10)
. - Symbols throw a
TypeError
. - Objects are first converted to a primitive by calling its
[@@toPrimitive]()
(with"string"
as hint),toString()
, andvalueOf()
methods, in that order. The resulting primitive is then converted to a string.
There are several ways to achieve nearly the same effect in JavaScript.
- Template literal:
`${x}`
does exactly the string coercion steps explained above for the embedded expression. - The
String()
function:String(x)
uses the same algorithm to convertx
, except that Symbols don't throw aTypeError
, but return"Symbol(description)"
, wheredescription
is the description of the Symbol. - Using the
+
operator:"" + x
coerces its operand to a primitive instead of a string, and, for some objects, has entirely different behaviors from normal string coercion. See its reference page for more details.
Depending on your use case, you may want to use `${x}`
(to mimic built-in behavior) or String(x)
(to handle symbol values without throwing an error), but you should not use "" + x
.
Escape sequences
Special characters can be encoded using escape sequences:
Escape sequence | Unicode code point |
---|---|
\0 |
null character (U+0000 NULL) |
\' |
single quote (U+0027 APOSTROPHE) |
\" |
double quote (U+0022 QUOTATION MARK) |
\\ |
backslash (U+005C REVERSE SOLIDUS) |
\n |
newline (U+000A LINE FEED; LF) |
\r |
carriage return (U+000D CARRIAGE RETURN; CR) |
\v |
vertical tab (U+000B LINE TABULATION) |
\t |
tab (U+0009 CHARACTER TABULATION) |
\b |
backspace (U+0008 BACKSPACE) |
\f |
form feed (U+000C FORM FEED) |
\uXXXX …where XXXX is exactly 4 hex digits in the range 0000 –FFFF ; e.g., \u000A is the same as \n (LINE FEED); \u0021 is ! |
Unicode code point between U+0000 and U+FFFF (the Unicode Basic Multilingual Plane) |
\u{X} …\u{XXXXXX} …where X …XXXXXX is 1–6 hex digits in the range 0 –10FFFF ; e.g., \u{A} is the same as \n (LINE FEED); \u{21} is ! |
Unicode code point between U+0000 and U+10FFFF (the entirety of Unicode) |
\xXX …where XX is exactly 2 hex digits in the range 00 –FF ; e.g., \x0A is the same as \n (LINE FEED); \x21 is ! |
Unicode code point between U+0000 and U+00FF (the Basic Latin and Latin-1 Supplement blocks; equivalent to ISO-8859-1) |
Long literal strings
Sometimes, your code will include strings which are very long. Rather than having lines that go on endlessly, or wrap at the whim of your editor, you may wish to specifically break the string into multiple lines in the source code without affecting the actual string contents.
You can use the +
operator to append multiple strings together, like this:
const longString =
"This is a very long string which needs " +
"to wrap across multiple lines because " +
"otherwise my code is unreadable.";
Or you can use the backslash character (\
) at the end of each line to
indicate that the string will continue on the next line. Make sure there is no space or
any other character after the backslash (except for a line break), otherwise it will not work. If the next line is indented, the extra spaces will also be present in the string's value.
const longString =
"This is a very long string which needs \
to wrap across multiple lines because \
otherwise my code is unreadable.";
Both of the above methods result in identical strings.
UTF-16 characters, Unicode codepoints, and grapheme clusters
Strings are represented fundamentally as sequences of UTF-16 code units. In UTF-16 encoding, every code unit is exact 16 bits long. This means there are a maximum of 216, or 65536 possible characters representable as single UTF-16 code units. This character set is called the basic multilingual plane (BMP), and includes the most common characters like the Latin, Greek, Cyrillic alphabets, as well as many East Asian characters. Each code unit can be written in a string with \u
followed by exactly four hex digits.
However, the entire Unicode character set is much, much bigger than 65536. The extra characters are stored in UTF-16 as surrogate pairs, which are pairs of 16-bit code units that represent a single character. To avoid ambiguity, the two parts of the pair must be between 0xD800
and 0xDFFF
, and these code units are not used to encode single-code-unit characters. Therefore, "lone surrogates" are often not valid values for string manipulation — for example, encodeURI()
will throw a URIError
for lone surrogates. Each Unicode character, comprised of one or two UTF-16 code units, is also called a Unicode codepoint. Each Unicode codepoint can be written in a string with \u{xxxxxx}
where xxxxxx
represents 1–6 hex digits.
On top of Unicode characters, there are certain sequences of Unicode characters that should be treated as one visual unit, known as a grapheme cluster. The most common case is emojis: many emojis that have a range of variations are actually formed by multiple emojis, usually joined by the <ZWJ> (U+200D
) character.
You must be careful which level of characters you are iterating on. For example, split("")
will split by UTF-16 code units and will separate surrogate pairs. String indexes also refer to the index of each UTF-16 code unit. On the other hand, @@iterator()
iterates by Unicode codepoints. Iterating through grapheme clusters will require some custom code.
"😄".split(""); // ['\ud83d', '\ude04']; splits into two lone surrogates
// "Backhand Index Pointing Right: Dark Skin Tone"
[..."👉🏿"]; // ['👉', '🏿']
// splits into the basic "Backhand Index Pointing Right" emoji and
// the "Dark skin tone" emoji
// "Family: Man, Boy"
[..."👨👦"]; // [ '👨', '', '👦' ]
// splits into the "Man" and "Boy" emoji, joined by a ZWJ
// The United Nations flag
[..."🇺🇳"]; // [ '🇺', '🇳' ]
// splits into two "region indicator" letters "U" and "N".
// All flag emojis are formed by joining two region indicator letters
Constructor
String()
-
Creates a new
String
object. It performs type conversion when called as a function, rather than as a constructor, which is usually more useful.
Static methods
String.fromCharCode()
-
Returns a string created by using the specified sequence of Unicode values.
String.fromCodePoint()
-
Returns a string created by using the specified sequence of code points.
String.raw()
-
Returns a string created from a raw template string.
Instance properties
These properties are defined on String.prototype
and shared by all String
instances.
String.prototype.constructor
-
The constructor function that created the instance object. For
String
instances, the initial value is theString
constructor.
These properties are own properties of each String
instance.
length
-
Reflects the
length
of the string. Read-only.
Instance methods
String.prototype.at()
-
Returns the character (exactly one UTF-16 code unit) at the specified
index
. Accepts negative integers, which count back from the last string character. String.prototype.charAt()
-
Returns the character (exactly one UTF-16 code unit) at the specified
index
. String.prototype.charCodeAt()
-
Returns a number that is the UTF-16 code unit value at the given
index
. String.prototype.codePointAt()
-
Returns a nonnegative integer Number that is the code point value of the UTF-16 encoded code point starting at the specified
pos
. String.prototype.concat()
-
Combines the text of two (or more) strings and returns a new string.
String.prototype.includes()
-
Determines whether the calling string contains
searchString
. String.prototype.endsWith()
-
Determines whether a string ends with the characters of the string
searchString
. String.prototype.indexOf()
-
Returns the index within the calling
String
object of the first occurrence ofsearchValue
, or-1
if not found. String.prototype.lastIndexOf()
-
Returns the index within the calling
String
object of the last occurrence ofsearchValue
, or-1
if not found. String.prototype.localeCompare()
-
Returns a number indicating whether the reference string
compareString
comes before, after, or is equivalent to the given string in sort order. String.prototype.match()
-
Used to match regular expression
regexp
against a string. String.prototype.matchAll()
-
Returns an iterator of all
regexp
's matches. String.prototype.normalize()
-
Returns the Unicode Normalization Form of the calling string value.
String.prototype.padEnd()
-
Pads the current string from the end with a given string and returns a new string of the length
targetLength
. String.prototype.padStart()
-
Pads the current string from the start with a given string and returns a new string of the length
targetLength
. String.prototype.repeat()
-
Returns a string consisting of the elements of the object repeated
count
times. String.prototype.replace()
-
Used to replace occurrences of
searchFor
usingreplaceWith
.searchFor
may be a string or Regular Expression, andreplaceWith
may be a string or function. String.prototype.replaceAll()
-
Used to replace all occurrences of
searchFor
usingreplaceWith
.searchFor
may be a string or Regular Expression, andreplaceWith
may be a string or function. String.prototype.search()
-
Search for a match between a regular expression
regexp
and the calling string. String.prototype.slice()
-
Extracts a section of a string and returns a new string.
String.prototype.split()
-
Returns an array of strings populated by splitting the calling string at occurrences of the substring
sep
. String.prototype.startsWith()
-
Determines whether the calling string begins with the characters of string
searchString
. String.prototype.substring()
-
Returns a new string containing characters of the calling string from (or between) the specified index (or indices).
String.prototype.toLocaleLowerCase()
-
The characters within a string are converted to lowercase while respecting the current locale.
For most languages, this will return the same as
toLowerCase()
. String.prototype.toLocaleUpperCase( [locale, ...locales])
-
The characters within a string are converted to uppercase while respecting the current locale.
For most languages, this will return the same as
toUpperCase()
. String.prototype.toLowerCase()
-
Returns the calling string value converted to lowercase.
String.prototype.toString()
-
Returns a string representing the specified object. Overrides the
Object.prototype.toString()
method. String.prototype.toUpperCase()
-
Returns the calling string value converted to uppercase.
String.prototype.trim()
-
Trims whitespace from the beginning and end of the string.
String.prototype.trimStart()
-
Trims whitespace from the beginning of the string.
String.prototype.trimEnd()
-
Trims whitespace from the end of the string.
String.prototype.valueOf()
-
Returns the primitive value of the specified object. Overrides the
Object.prototype.valueOf()
method. String.prototype[@@iterator]()
-
Returns a new iterator object that iterates over the code points of a String value, returning each code point as a String value.
HTML wrapper methods
Warning: Deprecated. Avoid these methods.
They are of limited use, as they are based on a very old HTML standard and provide only a subset of the currently available HTML tags and attributes. Many of them create deprecated or non-standard markup today. In addition, they do simple string concatenation without any validation or sanitation, which makes them a potential security threat when directly inserted using innerHTML
. Use DOM APIs such as document.createElement()
instead.
String.prototype.anchor()
Deprecated-
<a name="name">
(hypertext target) String.prototype.big()
Deprecated<big>
String.prototype.blink()
Deprecated<blink>
String.prototype.bold()
Deprecated<b>
String.prototype.fixed()
Deprecated<tt>
String.prototype.fontcolor()
DeprecatedString.prototype.fontsize()
DeprecatedString.prototype.italics()
Deprecated<i>
String.prototype.link()
Deprecated-
<a href="url">
(link to URL) String.prototype.small()
Deprecated<small>
String.prototype.strike()
Deprecated<strike>
String.prototype.sub()
Deprecated<sub>
String.prototype.sup()
Deprecated<sup>
Note that these methods do not check if the string itself contains HTML tags, so it's possible to create invalid HTML:
"</b>".bold(); // <b></b></b>
The only escaping they do is to replace "
in the attribute value (for anchor()
, fontcolor()
, fontsize()
, and link()
) with "
.
"foo".anchor('"Hello"'); // <a name=""Hello"">foo</a>
Examples
String conversion
It's possible to use String
as a more reliable
toString()
alternative, as it works when
used on null
and undefined
. For example:
const nullVar = null;
nullVar.toString(); // TypeError: nullVar is null
String(nullVar); // "null"
const undefinedVar = undefined;
undefinedVar.toString(); // TypeError: undefinedVar is undefined
String(undefinedVar); // "undefined"
Specifications
Specification |
---|
ECMAScript Language Specification # sec-string-objects |
Browser compatibility
BCD tables only load in the browser