🗃️Repeated / Nested Fields

What are they and how to query them

A repeated or nested field is a type of data type where multiple rows can exists for a single record in the database. They are often used when multiple sets of the information exists and belong to the same record. In the example below, for a single address, they can hold multiple tokens.

This is what the data looks like. The reason this type of data structure is used is to allow the database to efficiently store all the information for each address, without creating new rows every time an address acquire or dispose of a new token.

How to Query Repeated Fields

Selecting a Repeated Field

Repeated field are like any other field and can be selected directly. The above example was generated using the following SQL:

SELECT
    address,
    token_balance
FROM
    `nansen-query.aggregates_ethereum.address_stats`
WHERE
    address = lower('0x00000000003b3cc22af3ae1eac0440bcee416b40')

Unnesting a Repeated Field

To work with the data within a nested field, it is necessary to first unnest the field. In the following example, all of the tokens associated with this address have been "flatten" into a normal table.

SELECT
    address,
    token_address,
    token_balance_cents
FROM
    `nansen-query.aggregates_ethereum.address_stats`, unnest(token_balance)
WHERE
    address = lower('0x00000000003b3cc22af3ae1eac0440bcee416b40')

Once the the columns has been unnested, operations can be performed on it like any other field.

SELECT
    address,
    count(distinct(token_address)) AS nof_tokens,
    sum(token_balance_cents) AS sumof_tokens
FROM
    `nansen-query.aggregates_ethereum.address_stats`, unnest(token_balance)
WHERE
    address = lower('0x00000000003b3cc22af3ae1eac0440bcee416b40')
GROUP BY 1

Sample Queries

Here are some sample queries for when working with nested fields.

Example 1: Find addresses holding WETH

SELECT
    address,
    token_balance
FROM
    `nansen-query.aggregates_ethereum.address_stats`
WHERE
    LOWER("0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2") IN (SELECT token_address FROM UNNEST(token_balance))
    AND EXISTS (SELECT * FROM UNNEST(token_balance) WHERE token_address =   LOWER("0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2")
        AND token_balance_cents > 0)

Example 2: Filter NFT balance to only include bluechips (and only show addresses holding blue chips)

SELECT
    address,
    ARRAY(SELECT AS STRUCT * FROM UNNEST(nft_balance_per_project) WHERE is_bluechip AND nft_balance > 0) AS nft_balance_per_bluechip_project
FROM
    `nansen-query.aggregates_ethereum.address_stats`
WHERE EXISTS (SELECT * FROM UNNEST(nft_balance_per_project) WHERE is_bluechip AND nft_balance > 0)

Common Mistakes

Note that when the record has been unnested, the address field is repeated for each row. This has an important implication when calculating count and sum across columns that are repeated as a result of unnesting!

It is common to create a table that contains a mixture of normal and unnested columns.

In this case, summing the incoming_nof_transactions would provide an incorrect result as the incoming_nof_transactions columns was repeated as a result on unnesting.

Similarly, if there are two unnested fields in the same SELECT statement, the number of rows created from unnested would be a combination of everything in the first nested field and the second nested field.

For more information, please refer to the BigQuery page on working with arrays: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays

Last updated