🗃
Repeated / Nested Fields
What are they and how to query them
A repeated or nested field is a type of data type where multiple rows can exists for a single record in the database. They are often used when multiple sets of the information exists and belong to the same record. In the example below, for a single address, they can hold multiple tokens.

A repeated record data type
This is what the data looks like. The reason this type of data structure is used is to allow the database to efficiently store all the information for each address, without creating new rows every time an address acquire or dispose of a new token.

Example of an address with multiple tokens stored as a repeated record
Repeated field are like any other field and can be selected directly. The above example was generated using the following SQL:
SELECT
address,
token_balance
FROM
`nansen-query.aggregates_ethereum.address_stats`
WHERE
address = lower('0x00000000003b3cc22af3ae1eac0440bcee416b40')
To work with the data within a nested field, it is necessary to first unnest the field. In the following example, all of the tokens associated with this address have been "flatten" into a normal table.
SELECT
address,
token_address,
token_balance_cents
FROM
`nansen-query.aggregates_ethereum.address_stats`, unnest(token_balance)
WHERE
address = lower('0x00000000003b3cc22af3ae1eac0440bcee416b40')

A table unnested
Once the the columns has been unnested, operations can be performed on it like any other field.
SELECT
address,
count(distinct(token_address)) AS nof_tokens,
sum(token_balance_cents) AS sumof_tokens
FROM
`nansen-query.aggregates_ethereum.address_stats`, unnest(token_balance)
WHERE
address = lower('0x00000000003b3cc22af3ae1eac0440bcee416b40')
GROUP BY 1

Performing count and sum once a nested field has been flattened
Here are some sample queries for when working with nested fields.
SELECT
address,
token_balance
FROM
`nansen-query.aggregates_ethereum.address_stats`
WHERE
LOWER("0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2") IN (SELECT token_address FROM UNNEST(token_balance))
AND EXISTS (SELECT * FROM UNNEST(token_balance) WHERE token_address = LOWER("0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2")
AND token_balance_cents > 0)
SELECT
address,
ARRAY(SELECT AS STRUCT * FROM UNNEST(nft_balance_per_project) WHERE is_bluechip AND nft_balance > 0) AS nft_balance_per_bluechip_project
FROM
`nansen-query.aggregates_ethereum.address_stats`
WHERE EXISTS (SELECT * FROM UNNEST(nft_balance_per_project) WHERE is_bluechip AND nft_balance > 0)
Note that when the record has been unnested, the
address
field is repeated for each row. This has an important implication when calculating count and sum across columns that are repeated as a result of unnesting!It is common to create a table that contains a mixture of normal and unnested columns.

An example of a query selecting both flat and unnested columns
In this case, summing the
incoming_nof_transactions
would provide an incorrect result as the incoming_nof_transactions
columns was repeated as a result on unnesting.
An incorrect result for
total_transactions
Similarly, if there are two unnested fields in the same SELECT statement, the number of rows created from unnested would be a combination of everything in the first nested field and the second nested field.

Unnesting two fields resulting in duplication of the top level column (red) and the first unnested column (blue)
For more information, please refer to the BigQuery page on working with arrays: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays
Last modified 6mo ago