🗃

Repeated / Nested Fields

What are they and how to query them
A repeated or nested field is a type of data type where multiple rows can exists for a single record in the database. They are often used when multiple sets of the information exists and belong to the same record. In the example below, for a single address, they can hold multiple tokens.
A repeated record data type
This is what the data looks like. The reason this type of data structure is used is to allow the database to efficiently store all the information for each address, without creating new rows every time an address acquire or dispose of a new token.
Example of an address with multiple tokens stored as a repeated record

How to Query Repeated Fields

Selecting a Repeated Field

Repeated field are like any other field and can be selected directly. The above example was generated using the following SQL:
SELECT
address,
token_balance
FROM
`nansen-query.aggregates_ethereum.address_stats`
WHERE
address = lower('0x00000000003b3cc22af3ae1eac0440bcee416b40')

Unnesting a Repeated Field

To work with the data within a nested field, it is necessary to first unnest the field. In the following example, all of the tokens associated with this address have been "flatten" into a normal table.
SELECT
address,
token_address,
token_balance_cents
FROM
`nansen-query.aggregates_ethereum.address_stats`, unnest(token_balance)
WHERE
address = lower('0x00000000003b3cc22af3ae1eac0440bcee416b40')
A table unnested
Once the the columns has been unnested, operations can be performed on it like any other field.
SELECT
address,
count(distinct(token_address)) AS nof_tokens,
sum(token_balance_cents) AS sumof_tokens
FROM
`nansen-query.aggregates_ethereum.address_stats`, unnest(token_balance)
WHERE
address = lower('0x00000000003b3cc22af3ae1eac0440bcee416b40')
GROUP BY 1
Performing count and sum once a nested field has been flattened

Sample Queries

Here are some sample queries for when working with nested fields.

Example 1: Find addresses holding WETH

SELECT
address,
token_balance
FROM
`nansen-query.aggregates_ethereum.address_stats`
WHERE
LOWER("0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2") IN (SELECT token_address FROM UNNEST(token_balance))
AND EXISTS (SELECT * FROM UNNEST(token_balance) WHERE token_address = LOWER("0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2")
AND token_balance_cents > 0)

Example 2: Filter NFT balance to only include bluechips (and only show addresses holding blue chips)

SELECT
address,
ARRAY(SELECT AS STRUCT * FROM UNNEST(nft_balance_per_project) WHERE is_bluechip AND nft_balance > 0) AS nft_balance_per_bluechip_project
FROM
`nansen-query.aggregates_ethereum.address_stats`
WHERE EXISTS (SELECT * FROM UNNEST(nft_balance_per_project) WHERE is_bluechip AND nft_balance > 0)

Common Mistakes

Note that when the record has been unnested, the address field is repeated for each row. This has an important implication when calculating count and sum across columns that are repeated as a result of unnesting!
It is common to create a table that contains a mixture of normal and unnested columns.
An example of a query selecting both flat and unnested columns
In this case, summing the incoming_nof_transactions would provide an incorrect result as the incoming_nof_transactions columns was repeated as a result on unnesting.
An incorrect result for total_transactions
Similarly, if there are two unnested fields in the same SELECT statement, the number of rows created from unnested would be a combination of everything in the first nested field and the second nested field.
Unnesting two fields resulting in duplication of the top level column (red) and the first unnested column (blue)
For more information, please refer to the BigQuery page on working with arrays: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays