1

I have an app based on RubyOnRails 4.0. I have two models: Stores and Products. There are about 1.5 million products in the system making it quite slow if I do not use indices properly.

Some basic info

  • Store has_many Products
  • Store.affiliate_type_id is used where 1=Affiliated 2=Not affiliated
  • Products have attributes like "category_connection_id" (integer) and "is_available" (boolean)

In FeededProduct model:

scope :affiliated, -> { joins(:store).where("stores.affiliate_type_id = 1") } 

This query takes about 500ms which basically interrupts the website:

FeededProduct.where(:is_available => true).affiliated.where(:category_connection_id => @feeded_product.category_connection_id)

Corresponding postgresql:

FeededProduct Load (481.4ms)  SELECT "feeded_products".* FROM "feeded_products" INNER JOIN "stores" ON "stores"."id" = "feeded_products"."store_id" WHERE "feeded_products"."is_available" = 't' AND "feeded_products"."category_connection_id" = 345 AND (stores.affiliate_type_id = 1)

Update. Postgresql EXPLAIN:

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Hash Join  (cost=477.63..49176.17 rows=21240 width=1084)
   Hash Cond: (feeded_products.store_id = stores.id)
   ->  Bitmap Heap Scan on feeded_products  (cost=377.17..48983.06 rows=38580 width=1084)
         Recheck Cond: (category_connection_id = 5923)
         Filter: is_available
         ->  Bitmap Index Scan on cc_w_store_index_on_fp  (cost=0.00..375.25 rows=38580 width=0)
               Index Cond: ((category_connection_id = 5923) AND (is_available = true))
   ->  Hash  (cost=98.87..98.87 rows=452 width=4)
         ->  Seq Scan on stores  (cost=0.00..98.87 rows=452 width=4)
               Filter: (affiliate_type_id = 1)
(10 rows)

Question: How can I create an index that will take the inner join into consideration and make this faster?

6
  • EXPLAIN ANALYZE returns more useful details. Commented Nov 14, 2016 at 12:19
  • Thanks, but I could not find a good method to do that in RubyOnRails. Any advice here? Commented Nov 14, 2016 at 15:12
  • In PostgreSQL, use explain analyze instead of explain. Commented Nov 14, 2016 at 15:27
  • The thing is that I don't use PostgreSQL directly, but just indirectly through RubyOnRails. I am really not all that good at db administration so I use Rails-commands. In this case .explain and there doesn't seem to be any .explain_and_analyze or the likes of it. Commented Nov 14, 2016 at 15:29
  • Learning new things is fun. Isn't it fun? This is supposed to be fun, dang it. ;-) Commented Nov 14, 2016 at 18:07

1 Answer 1

2

That depends on the join algorithm that PostgreSQL chooses. Use EXPLAIN on the query to see how PostgreSQL processes the query.

These are the answers depending on the join algorithm:

  1. nested loop join

    Here you should create an index on the join condition for the inner relation (the bottom table in the EXPLAIN output). You may further improve things by adding columns that appear in the WHERE clause and significantly improve selectivity (i.e., significantly reduce the number of rows filtered out during the index scan.
    For the outer relation, an index on the columns that appear in the WHERE clause will speed up the query if these conditions filter out most of the rows in the table.

  2. hash join

    Here it helps to have indexes on both tables on those columns in the WHERE clause where the conditions filter out most of the rows in the table.

  3. merge join

    Here you need indexes on the columns in the merge condition to allow PostgreSQL to use an index scan for sorting. Additionally, you can append columns that appear in the WHERE clause.

Always test with EXPLAIN if your indexes get used. If not, odds are that either they cannot be used or that using them would make the query slower than a sequential scan, e.g. because they do not filter out enough rows.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks Laurenz, I never really used explain so I didn't think about it. I added the output to my question but to be honest, it doesn't say much to me since I don't really know how to interpret it. Could you give me a hand?
You need the EXPLAIN output from the production system. On the test system, it looks like everything is fine (only 453 rows in the inner table, index used for outer table).
I re-did it for the production system. Note that FeededProducts has about 1.5M whereas Stores have some 1.200 items (with the filter of affiliate, probably 453). The loading of FeededProducts is the one that takes time.
If the plan is the same, and stores is small, the query is probably as good as can be. You cannot avoid fetching all matching rows from feeded_products, right?
I guess not. Thanks, I will investigate this a bit further. The entire code is old and there might be workarounds that does not rely on indexing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.