2

I have a Table "A" with one column "col1" where each record is a array of integers.

col1
-----
{1,2,3,4}
{1,2,6,7}
{1,2,3,8,9}

I like to have one row as result which contains the overlap or intersect of all arrays in "col1".

select overlap(col1) from A;

result
-----
{1,2}
1
  • If the number of items in each set is variable, then this is going to be an ugly problem. A better option would be to make your data relational. Instead of having a CSV list, have a one record for each value in the set in one column, and some sort of group identifier in another column. Commented Jun 9, 2016 at 3:25

1 Answer 1

8

You should to define custom aggregate for this purpose:

CREATE OR REPLACE FUNCTION public.overlap_array_aggregate(anyarray, anyarray)
 RETURNS anyarray
 LANGUAGE plpgsql STRICT
AS $function$
BEGIN
  RETURN ARRAY(SELECT unnest($1) INTERSECT SELECT unnest($2));
END;
$function$

CREATE AGGREGATE array_overlap_agg (
   basetype = anyarray,
   sfunc =  overlap_array_aggregate,
   stype = anyarray );

Then it is working as you expect:

postgres=# SELECT * FROM foo;
┌─────────────┐
│      a      │
╞═════════════╡
│ {1,2,3,4}   │
│ {1,2,6,7}   │
│ {1,2,3,8,9} │
└─────────────┘
(3 rows)

postgres=# SELECT array_overlap_agg(a) FROM foo;
┌───────────────────┐
│ array_overlap_agg │
╞═══════════════════╡
│ {1,2}             │
└───────────────────┘
(1 row)
Sign up to request clarification or add additional context in comments.

5 Comments

Excellent answer. A possible performance improvement (since INTERSECT operates at the row level), can be RETURN CASE WHEN $1 && $2 THEN ARRAY(SELECT unnest($1) INTERSECT SELECT unnest($2)) END; which only would perform the INTERSECT if there is at least one element in common via the && array operator.
The effect of your proposal depends how often the intersection will be empty. When result will be empty array, then it can helps, when result will be non empty array, then it can decrease performance. But if you need really best performance, then you can write C extension. All depends on usage.
how will you do overlap_count(ary, tbl_column) i.e. between array passed as argument and specific column ?
@sten - can you send some example? Can you open new query?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.