Introduction to Solidity Reference Types and Arrays

Through this article, we’ll learn about reference types, in particular

structs,
arrays, and
mappings,

continued by a story on

data locations,
memory areas, and
assignment behaviors.

Finally, we’ll conclude with a section introducing

arrays.

It’s part of our long-standing tradition to make this (and other) articles a faithful companion, or a supplement to the official Solidity documentation, starting with these docs for this article’s topics.

Reference Types

In Solidity v0.8.15 reference types consist of structs, arrays, and mappings.

Reference types differ from the value types in one important property: the number of copies associated with the name.

💡 We can have multiple names for one singular instance of the value of reference type, while with value types we get a new instance (an independent copy or a deep copy) for each new name.

This means that value types can be independently changed and have no interaction with one another.

However, each change on a reference type affects all names tied to a reference value, therefore we have to be much more observant of the changes we make on a reference type, i.e., the names that are affected by a change.

In some cases, we’ll have to make a copy of a reference type to affect only the names of interest.

Data Location

Whenever we use a reference type, we must explicitly define the memory area where our variable is stored.

Let’s remember there are three memory areas available for use, each one with its own properties regarding life cycle and scope: storage, memory, and calldata.

Variables defined in storage have a life cycle that is limited to the life cycle of the contract. They are widely available throughout the contract (contract scope), except to functions whose state mutability is declared as ‘pure’.
Variables defined in memory have a life cycle limited to a life cycle of the external function call and the scope of a function (function scope).
Variables defined in calldata share the life cycle and scope with the external function call, but exist in a special memory area containing only the function parameters. We should note that, despite being similar to memory with regards to non-persistence, calldata is a non-modifiable area, which stores function arguments.

Assigning a reference type value to a name (variable) in a different memory area will always incur a content copy operation, i.e. it will produce a cloned instance of the reference type value, assigned to a name.

ℹ️ Info: We can use calldata as data location because it won’t make copies and it will protect the data from modification, due to the non-modifiability property of the calldata memory area. Arrays and structs placed in calldata memory area can be returned from functions, i.e. they will outlive the function, but we cannot allocate such types in calldata.

A few notes on Solidity versions

Before Solidity v0.6.9 reference-type arguments were limited only to calldata in functions with external visibility, memory in functions with public visibility memory, or storage in functions with internal or private visibility.

With Solidity v0.6.9 we can have both memory and calldata in all functions, regardless of function visibility.

Before Solidity v0.5.0 we could have omitted the memory area keyword, and based on the kind of reference type, the default memory area was inferred from the kind of variable, function type, etc. Now all complex types must have an explicit memory area definition.

Memory Area and Assignment Behavior

Memory area in use determines both the persistency of data (memory area life cycle) and semantics of the assignments, e.g. assignments between storage and memory or calldata variables will always create an independent copy.

On the other hand, assignments between variables designated to memory only create references (shallow copies).

The implication of only creating a reference is that modifications to one memory variable are also visible in other memory variables that refer to or point to the same data.

The other way we can (and sometimes do) state the same is: all names (variables) point to the same reference or object (synonyms).

Assignments from storage to a local storage variable operate by assigning only a reference. All other assignments to a storage variable will always be performed by copying the value.

Common use cases are assignments to state variables or members of local variables (such as a .<someProperty> of a struct) in local storage, also in cases when the local variable is just a reference:

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.5.0 <0.9.0;

contract C {
    // The data location of x is storage.
    // This is the only place where the
    // data location can be omitted.
    uint[] x;

    // The data location of memoryArray is memory.
    function f(uint[] memory memoryArray) public {
        // works, copies the whole array to storage
        x = memoryArray; 

        // works, assigns a pointer, data location of y is storage
        uint[] storage y = x; 
        y[7]; // fine, returns the 8th element
        y.pop(); // fine, modifies x through y
        delete x; // fine, clears the array, also modifies y

        // The following does not work; it would need to 
        // create a new temporary/unnamed array in storage, but storage 
        // is "statically" allocated:
        // y = memoryArray;
        // This does not work either, since it would "reset" the pointer, 
        // but there is no sensible location it could point to.

        // delete y;

        g(x); // calls g, handing over a reference to x
        h(x); // calls h and creates an independent, temporary copy in memory
    }

    function g(uint[] storage) internal pure {}
    function h(uint[] memory) public pure {}
}

Throughout Solidity and many other declarative and procedural programming languages, we’ll find the same or very similar pattern: some types of objects are copied (cloned, deep-copied), and some others are just referenced (shallow-copied).

The main, high-level difference between the two is that, as we previously mentioned,

copied values (objects) take more space, but are entirely independent of the original;
referenced valued point to one and the same value in memory and, accordingly, occupy the capacity only once, but changes to the value are instantly visible to all attached variables (names) and we must exercise caution when dealing with reference types.

Arrays

Arrays are available in two flavors: fixed size and dynamic size. The main difference is that fixed array size is known upfront and is available during the compile time, while the current size for a dynamic array is determined during the runtime.

Generally, the type of an array of fixed size k and element type T is commonly written as T[k], while an array of dynamic size is written just as T[] because the size is not known upfront.

In Solidity, we would declare an array of 5 dynamic arrays of uint in a form of uint[][5].

When we compare this notation to some other languages, we notice that the notation is reversed. Furthermore, a notation X[3] represents an array with three elements of type X, and X may also be an array.

In some other languages, such as C, the notation is different, where an array of arrays looks like X[a][b], with a and b representing the array dimension sizes.

Array indices are zero-based, as in many other programming languages.

Element access is done in the opposite direction of the array declaration, e.g., in uint[][5] memory x, we would access the third dynamic array as x[2], and the seventh element in the third dynamic array as x[2][6].

In general, if we have an array declaration T[5] a for a type T, which can also be an array, dereferencing a[2] would return a result of type T.

Array elements can be formed of any type of data, including types mapping or struct. However, we should remember that the general restrictions for types still apply, e.g. mapping-type variables can only be stored in the storage memory area, and functions with public visibility need parameters of ABI types.

Conclusion

In this article, we learned about reference types and made an intro to the topic of arrays.

First, we investigated what reference types are and what’s all the fuss about.
Second, we learned about the role of data location, variable life cycle, and scope.
Third, we got even wiser by learning about memory areas and assignment behaviors.
Fourth, we just scratched the surface on arrays.

What’s Next?

This tutorial is part of our extended Solidity documentation with videos and more accessible examples and explanations. You can navigate the series here (all links open in a new tab):

👈 Prev Tutorial

☝️ Syllabus

👉 Next Tutorial