Insights JOIN vs APPLY

JOIN vs APPLY

There seems to be a lot of strong opinions about using APPLY vs JOIN in the SQL community.  Many agree that if you can, the APPLY clause is the one to use for performance reasons.  So I decided to put these two clauses to the test, specifically OUTER APPLY and LEFT OUTER JOIN.  Microsoft System Center Service Manager’s database uses a Relationship table rather than Foreign Keys to create table relationships.  Each row in a table with a relationship uses a unique entity ID that matches a source or target entity ID in the Relationship table.  While the answer isn’t completely clear, it seems that the reason why is because if a Relationship table wasn’t used, there could be as many as 10 or so Foreign Keys on some tables.  This is just not realistically manageable.  On the flip side, it can make fundamental joining a little cumbersome.

Let’s take a look at two examples.  To be fair, I am using a lot of SELECT * to try and keep the playing field level for the JOIN clause since we can be selective with the APPLY clause.

The first example is using LEFT OUTER JOIN to join together two tables that have a relationship to each other.

The code is relatively straight forward and clear.  Once run, the query takes about 6 seconds and returns 65,834 rows and produces the following query plan:

Now let’s take a look at a query with its execution plan that uses OUTER APPLY to produce the same results:

The code is certainly more involved.  Again, this query returned the same number of rows but execution took 11 seconds.  Hang on a second…what?!  Yep, here’s the proof.

It should be noted that this query was also run AFTER the JOIN was run so Statistics also had a chance to update.

I continued to run these queries again and again, switching orders, one after another.  Query times began to get closer together but not once did the OUTER APPLY based query ever run faster than the JOIN even with the same query plan.  There are also other variables that could have come into play in this case though such as CPU load.  Only when I replaced the SELECT * with an explicit SELECT in the APPLY clauses example did it ever run faster than the JOIN based query and consistently ran for 5 seconds with a half dozen executions.  The JOIN clause ran for approximately 6 – 8 seconds consistently for nearly every execution.

I like using the APPLY clause.  It gives me a sense of control.  However, in cases like this, it just isn’t practical.  The code is too long, it’s hard to read, and from what I can gather, isn’t any better performing.