Dangling References to Storage Array Elements – Solidity Reference Types

5/5 - (2 votes)
Dangling References to Storage Array Elements - Solidity Reference Types

In this article, while being unable to keep the focus on anything else with all this dangling around, we’ll entertain ourselves with dangling references to storage array elements.

It’s part of our long-standing tradition to make this (and other) articles a faithful companion, or a supplement to the official Solidity documentation, starting with this part for this article’s topic.

Dangling References to Storage Array Elements

Unlike working with memory arrays, when we manipulate storage arrays, we should be very careful to avoid dangling references.

💡 Definition: A dangling reference is a reference pointing to a non-existent object or an object whose location changed. This is a usual source of problems when working with referenced objects, even in other programming languages.

A common example of a dangling reference is a situation where we have stored a reference to an array element within a local variable and then removed the element from the array by calling .pop() on the array:

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.8.0 <0.9.0;

contract C {

Declares a dynamic array of dynamic arrays.

    uint[][] s;

    function f() public {
        // Stores a pointer to the last array element of 's'.
        uint[] storage ptr = s[s.length - 1];
        // Removes the last array element of 's'.
        s.pop();
        // Writes to the array element that is no longer within the array.
        ptr.push(0x42);
        // Adding a new element to 's' now will not add an empty array, but
        // will result in an array of length 1 with '0x42' as an element.
        s.push();
        assert(s[s.length - 1][0] == 0x42);
    }
}

🏹 Info: Programmers sometimes use the term “pointer” as a synonym for a reference. There’s a difference between the two terms, whose etymology lies in older programming languages like C++:

“A pointer in C++ is a variable that holds the memory address of another variable. A reference is an alias for an already existing variable.” source

There are some other subtleties to the difference, but for the purpose of our subject, we can treat references and pointers as the same thing.

We have to take a closer look at what just happened in the example. As opposed to what we may expect, adding the element 0x42 in line ptr.push(0x42) will not cause a revert, even though ptr contains a reference to a non-existing element of array s.

The Solidity compiler assumes that unused storage is always zeroed (a potential security issue we should otherwise take care of in our code), the following s.push() will not overwrite the location in storage with zeroes.

The consequence of these mechanics is that the last (fresh) element of the array s after the push() will have length 1 (remember that each array element in this example is an array too) and contain 0x42 as its first element.

In short, whatever was in that location before the storage allocation, will get included in the content.

Solidity does not allow declaring references to value types in storage, which works in our favor because we would get warned if we tried doing something insensible.

On the other hand, dangling references, as shown in our example, are restricted to nested reference types, but they can also occur for a very short time when using complex expressions in tuple assignment, as in the following example:

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.8.0 <0.9.0;

contract C {
    uint[] s;
    uint[] t;

    constructor() {
        // Push some initial values to the storage arrays.
        s.push(0x07);
        t.push(0x03);
    }

    function g() internal returns (uint[] storage) {
        s.pop();
        return t;
    }

    function f() public returns (uint[] memory) {

In order to understand what happens in this relatively simple example, we have to remember how member function .push() works:

When called without arguments, as in the example, .push() will append one element to the array, assuming the storage memory area at that location to be zeroed, and return the reference to this last element.

Let’s remember how .push() always returns a reference to the appended element, while .push(<some_element>) does not return anything. That’s the key to understanding what’s going on!

        // The following will first evaluate 's.push()' 
        // to a reference to a new element at index 1. 
        // Afterwards, the call to 'g' pops this new element, resulting in
        // the left-most tuple element to become a dangling reference. 
        // The assignment still takes place and will write outside the 
        // data area of 's'.
        (s.push(), g()[0]) = (0x42, 0x17);

        // A subsequent push to 's' will reveal the value written 
        // by the previous statement, i.e. the last element of 's' at 
        // the end of this function will have the value '0x42'.
        s.push();
        return s;
    }
}

We can take from the example above a recommendation to assign to storage only once per statement and to avoid complex expressions on the left side of an assignment (like having a function call, i.e. g()[0]).

Otherwise, we might find that our code runs unexpectedly and hosts some nasty, hard-to-catch bugs.

It is also recommended to take special care when working with references to elements of bytes arrays, because using a .push() on the array may change its layout in storage (docs).

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.8.0 <0.9.0;

// This will report a warning
contract C {
    bytes x = "012345678901234567890123456789";

    function test() external returns(uint) {
        (x.push(), x.push()) = (0x01, 0x02);
        return x.length;
    }
}

The key action in the example above happens in the line containing (x.push(), x.push()) = (0x01, 0x02). At first, the state variable x holds 30 bytes and uses a short layout in storage.

ℹī¸ Reminder: We can distinguish a short array (short layout) from a long array (long layout) by checking if the lowest bit is set: short (not set, 0) and long (set, 1) (docs).

When the first x.push() is evaluated, x is still stored in short layout, meaning x.push() will calculate the next location, note a reference to it, write the element and return the reference to the element just written (appended) in the first storage slot of x.

At the moment, x holds 31 bytes, which is the maximum length for using short layout.

The second x.push() will first make a reference to a location of the 32nd byte, and then trigger a switch to long layout in storage, meaning the original array location now holds only the length field, while the array data is moved from the original location to a new data area.

Now that the reference unknowingly refers to a part of the length field, x.push() writes to that location, corrupts the length data, and returns the reference.

In the example and explanation above, we have described the behavior of dangling storage references in the current version of the compiler.

Conclusion

Nevertheless, we should consider any code with dangling references as dangerous in terms of having undefined behavior.

Solidity authors may change the behavior of code involving dangling references in the future, but until such a version of the Solidity compiler is released, let’s just make sure and avoid them in our code.

In this article, we dug ourselves deep into dangling references to storage array elements.

What’s Next?

This tutorial is part of our extended Solidity documentation with videos and more accessible examples and explanations. You can navigate the series here (all links open in a new tab):