In this article, we’ll learn about string literals and types, Unicode literals, hexadecimal literals, and enums.
They’re not complex types, and getting to know them will be simple, but also prove very useful in our future endeavors through the Solidity programming language.
It’s part of our long-standing tradition to make this (and other) article a faithful companion or a supplement to the official Solidity documentation for this article’s topics.
π Scroll down to the end if you want to download the presentation PDF slides of this tutorial’s data types.
String Literals and Types
We can write string literals with double or single quotes, as in "string"
or 'literal'
.
Also, we can visually split them into two or more consecutive parts that are interpreted together, such as "string" "literal"
, which is the same as "stringliteral"
.
Why would we do that, when we’re perfectly capable of writing one normal string?
The answer lies in a situation where we have to deal with long strings that stretch well over the editor margin that we generally adhere to when we’re typing our code (note the space symbol prepended to the second part in the second line):
String normalString = "This is a string that we would normally use"; String longString = "This is a very long string that we do not see" " frequently and this is a trick to make it more manageable";
String literals do not imply trailing zeros, as in programming language C, where we would type "word"
(4 bytes), but it would internally get stored as "word\0"
(4 bytes + 1 byte for string-terminating symbol \0
= 5 bytes).
In Solidity, what we type is what we get.
As we previously saw with integer literals, the type of string literals can vary, however, they are implicitly convertible to bytesN
(N = [1, 32]
), to bytes, and to string.
For example, the string literal is assigned to a bytes32
type and interpreted in its raw byte form:
bytes32 samevar = "stringliteral";
String literals may contain only printable ASCII characters (a character encoding standard for electronic communication, https://www.asciitable.com/), ranging from 0x20
(symbol <space>
) to including symbol 0x7E
(symbol tilde, ~
).
Besides this range, string literals additionally support this list of escape characters:
\<newline>
(escapes an actual newline)\\
(backslash)\'
(single quote)\"
(double quote)\n
(newline)\r
(carriage return)\t
(tab)\xMN
(hex escape, description follows)\uNOPQ
(Unicode escape, description follows)
Escape character \xMN
, where MN
represents two hexadecimal digits, is interpreted, and the corresponding byte is inserted in the string literal.
Escape character \xNOPQ
, where NOPQ
represents a corresponding Unicode codepoint, is interpreted and a UTF-8 sequence is inserted in the string literal.
βΉοΈ Info: “In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single graphemeβusually a letter, digit, punctuation mark, or whitespaceβbut sometimes represent symbols, control characters, or formatting.”
(https://web.archive.org/web/20180919061218/https://www.unicode.org/versions/Unicode11.0.0/ch02.pdf).
π‘ Note: Solidity versions after 0.8.0
dropped three escape sequences: \b
(backspace character), \f
(form feed) and \v
(vertical tab). Although these are commonly available in other languages, they’re also rarely required for practical purposes. If we find ourselves in a situation of needing them, we can insert them via hexadecimal escapes, as \x08
, \x0c
, and \x0b
. The same principle applies to all ASCII characters.
The following string literal example might seem like a mental exercise, but we’ll analyze it one symbol at a time:
"\n\"\'\\abc\ def"
Zeroeth, we have the start of the string literal, marked with the double quote symbol "
.
- First, we have a newline symbol
\n
(1 byte). - Second, we have an escaped double quote symbol
\"
(1 byte). - Third, we have an escaped single quote symbol
\'
(1 byte). - Fourth, we have an escaped slash symbol
\\
(1 byte). - Fifth, we have three characters
abc
(3 bytes). - Sixth, we have three more characters
def
(3 bytes). - Last, we have the closing of the string literal, marked with the double quote symbol
"
.
In total, that’s 10 bytes.
However, the fifth and sixth parts beg for explanation. Why do they form a single character sequence, abcdef
?
The answer lies in the fact that the newline symbol terminates the string literal only if it is not escaped (preceded, prefixed) by a \
(backslash) symbol. In our example, abc
is followed by a \<newline>
, so the string isn’t terminated.
Note: A general rule says that any Unicode line terminator which is not a newline (i.e. LF, VF, FF, CR, NEL, LS, PS) is considered to unconditionally terminate the string literal. A newline symbol will also terminate the string literal in its normal form. However, if it’s escaped with a backslash symbol \
, a newline will not terminate the string literal.
Unicode Literals
In contrast to regular string literals that can only consist of ASCII characters, Unicode literals can contain any valid UTF-8 sequence. To declare a Unicode literal, we have to prefix a string literal with the unicode keyword, e.g.
// SPDX-License-Identifier: GPL-3.0 pragma solidity ^0.8.8; contract example { string public a = unicode"Hello "; }
Unicode literals use the same set of escape sequences as string literals.
Hexadecimal literals
Hexadecimal literals are written as ordinary string literals, enclosed in single or double quotes, but are prefixed with a keyword hex, e.g. hex"001122FF"
, hex'0011_22_FF'
.
The content of a hexadecimal literal is represented by hexadecimal digits, which can optionally use one underscore as a separator between byte boundaries. The value of the hexadecimal literal will be the binary representation of the hexadecimal sequence:
// SPDX-License-Identifier: GPL-3.0 pragma solidity ^0.8.8; contract hexadecimal_literal { function get_literal() public pure returns ( bytes memory ) { return hex'0011_22_FF'; } }
In a case of multiple hexadecimal literals separated by a whitespace character, the result is a single concatenated literal, e.g.
// SPDX-License-Identifier: GPL-3.0 pragma solidity ^0.8.8; contract hexadecimal_literal { function get_literal() public pure returns ( bytes memory ) { return hex"00112233" hex"44556677"; } }
The result of calling the get_literal()
function is:
bytes: 0x0011223344556677
.
Owing to their similarity, hexadecimal literals share the same convertibility restrictions as string literals.
Enums
One of the ways to create a user-defined type in Solidity is by using enums.
Enums and integers are explicitly convertible in both directions (to and from), though, implicit conversion is not supported.
The explicit conversion from an integer happens during runtime by checking if the value falls inside the range of an enum declaration. If not, the check will cause a Panic error.
Enum declarations work with one or more members, and the default value at the declaration is its first element. The upper limit on the number of members is 256.
Speaking of the data representation, it’s the same as for enums in the C programming language (a reference for those with experience with C): the items in an enum declaration are zero-based indexed using unsigned integer values.
We can get the smallest and largest value of an enum by inspecting the type properties, i.e. type(NameOfEnum).min
and type(NameOfEnum).max
.
The following example will show us how enums in action:
I’m sure these first lines need no introduction π
// SPDX-License-Identifier: GPL-3.0 pragma solidity ^0.8.8; contract test {
An enum declaration with all the items:
enum ActionChoices { GoLeft, GoRight, GoStraight, SitStill }
An enum-type variable declaration:
ActionChoices choice;
An enum-type constant declaration:
ActionChoices constant defaultChoice = ActionChoices.GoStraight;
The choice variable setter()
function declaration:
function setGoStraight() public { choice = ActionChoices.GoStraight; }
The choice variable getter()
function declaration:
// Since enum types are not part of the ABI, the signature of "getChoice" // will automatically be changed to "getChoice() returns (uint8)" // for all matters external to Solidity. function getChoice() public view returns (ActionChoices) { return choice; }
Notice how a function declared as pure can read a constant from the environment, but not a variable:
function getDefaultChoice() public pure returns (uint) { return uint(defaultChoice); }
Notice how a function declared as pure (both getLargestValue()
and getSmallestValue()
) can read a type property from the environment:
function getLargestValue() public pure returns (ActionChoices) { return type(ActionChoices).max; } function getSmallestValue() public pure returns (ActionChoices) { return type(ActionChoices).min; } }
This simple, but telling example showed us how we can use and access an enum, as a type, as a constant, and as an instance.
Note: Besides in a contract or in a library, we can also declare an enum on a file level, e.g.
// SPDX-License-Identifier: GPL-3.0 pragma solidity ^0.8.8; // An outside enum declaration. enum OutsideChoices { Golf, Soccer, Tennis, Hammock }
Slides PDF Download
Conclusion
In this article, we became very good friends with even more data types!
First, we got a lot closer with string literals and types.
Second, we also got to know Unicode literals.
Third, we even got that hackery feeling by learning about hexadecimal literals.
Fourth, as I always finish each article by enumerating its sections, incidentally, the last section is all about enums.
π Recommended Tutorial: Solidity Types Boolean and Integer
What’s Next?
This tutorial is part of our extended Solidity documentation with videos and more accessible examples and explanations. You can navigate the series here (all links open in a new tab):