20 November 2007

Table Joins vs. php Loops

Table joins are fun and useful, but did you know that there are circumstances that a php loop will beat out table joins?

For those of you who already know this, feel free to return to www.PlanetMySql.com. But for the rest of us, read on...

The Setup (fictional, but mimics my production environment)

I have two tables, employees and termination_info. Employees has 150,000 records and termination_info has 65,000 records.

The Request

I have been asked to remove employees that match the email address in termination_info (id matching would be my very next project on this one).

The Choices

1) DELETE FROM employees a USING employees a, termination_info b WHERE a.email = b.email;
2) php loop that loops through termination_info records and deletes the corresponding employee record.

Which One is Faster?
#2, by a wide margin.

Why?
'EXPLAIN SELECT a.email FROM employees a INNER JOIN termination_info b ON a.email = b.email' only gives part of the picture. Sure, the email index shows that it will use it, so mysql only needs to study 65,000 rows. However, it needs processing space. A lot of it.

This is a great reason why the api's are made for mysql -- the php api allows us to interface with mysql and perform jobs more efficiently. The "real" size of the tables that I used this were 230,000 records and 3.9 million records. The php script took under 10 minutes, and the mysql version is still running clearly over an hour since it began (and is still running as I write this).

Until next time,
David

No comments: