8

I'm trying to convert Array< struct > to multiple columns. The data structure looks like:

column name: Parameter
[
  -{
      key: "Publisher_name"
      value: "Rubicon"
   }
  -{
      key: "device_type"
      value: "IDFA"
   }
  -{
      key: "device_id"
      value: "AAAA-BBBB-CCCC-DDDD"
   }
] 

What I want to get:

publisher_name  device_type  device_id
Rubicon         IDFA         AAAA-BBBB-CCCC-DDDD

I have tried this which caused the duplicates of other columns.

select h from table unnest(parameter) as h

BTW, I am very curious why do we want to use this kind of structure in Bigquery. Can't we just add the above 3 columns into table?

4
  • I'm confused by the array - in BigQuery the schema of each struct has to be the same within one array. Other keys get overwritten, other types are not even possible. Commented Aug 10, 2018 at 18:30
  • what you mean by confusion - the schema here looks to me simple / clear - ARRAY<STRUCT<key STRING, value STRING>> Commented Aug 10, 2018 at 18:39
  • aah .. key is a key and value is a key ... well .. that's confusing for me :D got it now, thanks Commented Aug 10, 2018 at 18:40
  • yes, at least this is how i read the question :o) Commented Aug 10, 2018 at 18:41

2 Answers 2

15

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  (SELECT value FROM UNNEST(Parameter) WHERE key = 'Publisher_name') AS Publisher_name,
  (SELECT value FROM UNNEST(Parameter) WHERE key = 'device_type') AS device_type,
  (SELECT value FROM UNNEST(Parameter) WHERE key = 'device_id') AS device_id
FROM `project.dataset.table`

You can further refactor code by using SQL UDF as below

#standardSQL
CREATE TEMP FUNCTION getValue(k STRING, arr ANY TYPE) AS
((SELECT value FROM UNNEST(arr) WHERE key = k));
SELECT 
  getValue('Publisher_name', Parameter) AS Publisher_name,
  getValue('device_type', Parameter) AS device_type,
  getValue('device_id', Parameter) AS device_id
FROM `project.dataset.table`
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your quick response. It works perfectly. Btw, do you know what the benefit is to use this kind of structure instead adding these columns into the table?
i think it really depends on your use case. if you end up with pivoting anyway - why not to have this in separate columns from the very beginning? still it is totally depends on your use case
the benefit of having array here can be in case if you do have number of aother columns - now they are stored just once per row - but if you will be storing your array's structs as columns - you will need to store redundant data for other columns and this will be extra cost
Hi @MikhailBerlyant i found this thread and am having a similar question. In my case my array of struct could have over 100 key values: key_1, ..., key_100 for example. In this case what's an efficient way to proceed other than call the function getValue 100 times?
1

To convert to multiple columns, you will need to aggregate, something like this:

select ?,
       max(case when pv.parameter = 'Publisher_name' then value end) as Publisher_name,
       max(case when pv.parameter = 'device_type' then value end) as device_type,
       max(case when pv.parameter = 'device_id' then value end) as device_id
from t cross join
     unnest(parameter) pv
group by ?

You need to explicitly list the new columns that you want. The ? is for the columns that remain the same.

2 Comments

Thanks Gordon. It works perfectly! Btw, do you know what the benefit is to use this kind of structure instead adding these columns into the table?
@DavidD . . . The parameter/value pairs can differ from row-to-row. I would (and actually do) use JSON for this purpose. I think it more flexible because the "value" doesn't have to be a string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.