Solidity Bytes and String Arrays, Concat, Allocating Memory, and Array Literals

5/5 - (2 votes)
Solidity Bytes and String Arrays, Concat, Allocating Memory, and Array Literals

πŸ’‘ With this article, we’ll discover a new and fascinating world of bytes and strings, as well as ways to manipulate them, allocate memory arrays, and use array literals.

It’s part of our long-standing tradition to make this (and other) articles a faithful companion, or a supplement to the official Solidity documentation, starting with these docs for this article’s topics.

Types bytes and string

Besides the arrays we’ve already discussed, there are also some unique arrays, such as bytes and string arrays.

We have to note that the bytes type is very similar to bytes1[], however, the difference is that a bytes array is tightly packed in memory areas calldata and memory.

Furthermore, string is equal to bytes, but does not have a length property or support for index access.

Solidity doesn’t have string manipulation functions compared to other commonly used programming languages, but this can be worked around by including third-party string libraries.

With vanilla Solidity, we can concatenate two strings, e.g. string.concat(s1, s2), and compare two strings by using their keccak-256 hash, e.g.

keccak256(abi.encodePacked(s1)) == keccak256(abi.encodePacked(s2)).

Regarding the preferred use (we could consider this a design pattern), the bytes type is better than bytes1[], because bytes1[] is more expensive due to padding additional 31 bytes between the elements when used in memory.

The padding is absent in storage because of the tight packing used (docs).

πŸ‘ Note: A rule of thumb says that bytes should be used for arbitrary-length raw byte data and string for arbitrary-length string data in UTF-8.

πŸ’‘ Note: If our data can be stored in a variable containing a number of bytes up to 32, it is better to use one of the value types bytes1 ... bytes32, due to their low cost.

To access a byte representation of a string s, we could use the following construct: bytes(s)[7] = 'x'; with regard to the string length, bytes(s).length, e.g.

// SPDX-License-Identifier: GPL-3.0

pragma solidity >=0.7.0 <0.9.0;

/** 
 * @title String modification
 * @dev Demonstrates how to modify a string represented as bytes.
 */
contract StringModification {
    string public s = "Some string";

    function modifyString()
    public
    {
        bytes(s)[7]='Q';
    }
}

πŸ’‘ Note: By using this approach, we’re accessing bytes of the UTF-8 representation, not the individual characters.

Functions bytes.concat() and string.concat()

Concatenation is a synonym for joining or gluing together.

🌍 Recommended Tutorial: String Concatenation in Solidity

String Concatenation

The function string.concat() enables us to concatenate any number of string values.

The result of using the string.concat() function is a single-string memory array containing the concatenated strings without any added spacing or padding.

If we’d like to use function parameters of other types that are not implicitly convertible to the string type, we first have to convert them to the string type.

Byte Concatenation

In the same manner, the bytes.concat() function enables us to concatenate any number of bytes or bytes1 ... bytes32 values.

The function result is a single bytes memory array containing the arguments without padding.

If we’d like to use string parameters or other types not implicitly convertible to bytes type, we first convert them to the bytes type.

Example

Let’s use an example to show how a function performs both string and bytes concatenation:

// SPDX-License-Identifier: GPL-3.0
pragma solidity ^0.8.12;

contract C {
    string s = "Storage";
    function f(bytes calldata bc, string memory sm, bytes16 b) public view {
        string memory concatString = string.concat(s, string(bc), "Literal", sm);
        assert((bytes(s).length + bc.length + 7 + bytes(sm).length) == bytes(concatString).length);

        bytes memory concatBytes = bytes.concat(bytes(s), bc, bc[:2], "Literal", bytes(sm), b);
        assert((bytes(s).length + bc.length + 2 + 7 + bytes(sm).length + b.length) == concatBytes.length);
    }
}

By calling bytes.concat(...) and string.concat(...) without arguments, a result is an empty array.

Allocating Memory Arrays

We can dynamically resize the storage arrays by adding elements via the .push() member function.

In contrast, memory arrays cannot be dynamically resized and the .push() member function is not available.

However, by using the alternative approach, we can create dynamic-length memory arrays by using the new operator. Just before using the new operator, we have to calculate the required size in advance or create a new, empty array and populate it by copying all elements.

πŸ’‘ Note: Following the same rule of default values, the elements of freshly allocated arrays are initialized with their default values (docs).

Here we have an example showing arrays a and b, initialized by either a constant size or a parameter-given size.

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.4.16 <0.9.0;

contract C {
    function f(uint len) public pure {
        uint[] memory a = new uint[](7);
        bytes memory b = new bytes(len);
        assert(a.length == 7);
        assert(b.length == len);
        a[6] = 8;
    }
}

Array Literals

Array literal is represented by a comma-separated list of any number of expressions, which are listed in square brackets, e.g. [1, a, f(3)].

The array literal type is determined in the following way:

  1. The array literal is a statically-sized memory array, and its length is the number of expressions listed in the brackets;
  2. The base type of the array is determined by the type of the first expression T in the list that satisfies the condition: all other expressions must be implicitly convertible to T. If it’s not possible to find such an expression, a type error is thrown;
  3. Besides the convertibility condition (point 2.), one of the expressions must be of the T type.

The following example will clarify what the points above mean; the type of an array literal [1, 2, 3] is uint8[3] memory, because each of the expressions is of type uint8.

If we want to change the result to type uint[3] memory, we have to convert the first element to uint.

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.4.16 <0.9.0;

contract C {
    function f() public pure {
        g([uint(1), 2, 3]);
    }
    function g(uint[3] memory) public pure {
        // ...
    }
}

In contrast, the array literal [1, -2] is invalid because it doesn’t comply with point 2., stating that the first expression’s type is a target type T for implicit conversion of other expressions.

Since our first expression is of type uint8, and the second expression is of type int8 (including the negative numbers), the second expression cannot be implicitly converted to uint8.

To avoid a type error, we can declare our array literal as [int8(1), -1], forcing the first expression to be of compatible type int8.

In a more specific case of using, e.g. two-dimensional array literals, we’d step on a problem of fixed-size memory arrays that cannot be converted into each other, regardless of the compatibility of base types.

We can get around this problem by explicitly specifying a common base:

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.4.16 <0.9.0;

contract C {
    function f() public pure returns (uint24[2][4] memory) {
        uint24[2][4] memory x = [[uint24(0x1), 1], [0xffffff, 2], [uint24(0xff), 3], [uint24(0xffff), 4]];
        // The following does not work, because some of the inner arrays are not of the right type.
        // uint[2][4] memory x = [[0x1, 1], [0xffffff, 2], [0xff, 3], [0xffff, 4]];
        return x;
    }
}

We cannot assign fixed-size memory arrays to dynamically-sized memory arrays, as shown by the example:

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.4.0 <0.9.0;

// This will not compile.
contract C {
    function f() public {
        // The next line creates a type error because uint[3] memory
        // cannot be converted to uint[] memory.
        uint[] memory x = [uint(1), 3, 4];
    }
}

To initialize dynamically-sized arrays, we’d have to resort to assigning the elements individually, as in the example:

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.4.16 <0.9.0;

contract C {
    function f() public pure {
        uint[] memory x = new uint[](3);
        x[0] = 1;
        x[1] = 3;
        x[2] = 4;
    }
}

Conclusion

In this article, we learned even more about reference types, in particular, bytes and string arrays and concatenation, memory array allocation, and array literals.

  1. First, we explained the uniqueness of the arrays based on bytes and string types, and also touched on some of the similarities with the akin types.
  2. Second, we’ve peeked into how to do string concatenation, comparison, and bytes concatenation.
  3. Third, we discovered the specifics of allocating memory arrays and got introduced to the new operator.
  4. Fourth, we got to know array literals with rules for determining the array literal base type. We also became aware of the invalid array literals and what can be done to make them valid.

What’s Next?

This tutorial is part of our extended Solidity documentation with videos and more accessible examples and explanations. You can navigate the series here (all links open in a new tab):