jueves, 18 de marzo de 2010

Substracting sets with SQL

Hey guys,

Long time no see!

I've wanted to mentioned about an sql's trick that I used on my current project and was pretty handy!

The Problem

Let's say you have a table of users and wanted to write a query to retrieve the rows that meet certain criteria and doesn't meet other... Also, we don't want to use NOT EXISTS statement (as it is too expensive in terms of performance). In short, you want something like this:

You_want = Some_Set_Users (I) - Another_Users_Set (II)

This can be accomplished by simple execute a couple of queries, one for (I) and another for (II) and just subtract them using the programming language. For instance, in Rails you just can do the following:

query_1 = User.find(..conditions for set (I)..)
query_2 = User.find(..conditions for set (II)..)
..and then just using ruby..
result = query_1 - query_2

But, this has a couple of drawbacks:

1) This would load all records into memory which it not good, as it takes time and occupies valuable space
2) And two more importantly in most cases, it's throughput is worse than doing it directly on the database.

So, which is the alternative ?

The Solution

Although, joining tables is not the cheapest operation available for a database, still it does the job pretty efficiently and it can be tunned to be pretty fast depending on your needs.

So, in this case, I propose to use the following strategy:

select some_users_set.* from users as some_users_set LEFT JOIN
(select inner_another_users_set from inner_users as another_users_set where ..conditions for another_users_set...) as another_users_set on some_users_set.id = another_users_set.id
where ..conditions for some_users_set.... AND another_users_set.id IS NULL


The Explanation

The idea behind this sql is to do a LEFT JOIN of the two sets (I and II) that we want to make the subtraction.

This way, as a LEFT JOIN's operation will always match for every row on both sets, when there is no row on set II for a given row on I, we will get all fields for set II filled with NULL (as there is no correspondent on the join). And those cases, are exactly the ones we are looking for, the ones that exists on set I but there is no correspondent on set II.

I find this trick very useful, I hope you could benefit from it too!

Let me know your thoughts, and obviously any questions if there is something unclear!

See ya!

No hay comentarios: