18 December 2007

How to Find Domain Names of Emails Without Using LIKE '%domain.com'

I like working at companies where there are "problems related to success". My colleague and I from a former company that I worked for in LA talked about two types of problems faced by business:
  1. Problems related to failure - the problems of having your doors open for business but few people know about it;
  2. Problems related to success -the problems of having so many people come through your doors that you are trying to figure out how to handle it
The world of web is very much like this. Rarely would one hear that a company is in the "sweet spot in the middle", and if a company is in such a position, everyone is very quiet about it as to not the jinx the atmosphere around it. Instead, you hear "We just need to bring our numbers up by promoting 1,000 more youtube videos", or the other end - "We just brought in three racks of servers and we can barely keep up".

And, as it happens, "problems of success" fall on the lap of the mysql DBA. As many DBAs should be aware, we need to prepare for the future, before people start calling you a Disaster Briefing Analyst instead of Database Administrator.

One of these moments concerns us with the LIKE statement. We all hope that we don't need to use LIKE with the wildcard % in the front. But, as it happens, with tables containing email addresses, an business analyst will request a count of all msn.com-based email addresses.

How to do this without LIKE '%msn.com'?

REVERSE to the Rescue

Just reverse the email address in the table and then reverse the domain you are searching for. Here's how to create a table for the testing:
CREATE TABLE rv_emails (rv_email_id INT UNSIGNED NOT NULL AUTO_INCREMENT, email CHAR(50) NOT NULL, rv_email CHAR(50) NOT NULL, PRIMARY KEY (rv_email_id), INDEX idx_email (email), INDEX idx_rv_email (rv_email));

When I SELECT COUNT(1) FROM rv_emails WHERE email LIKE '%msn.com' when this table has a million records, it takes 1.28 sec to retrieve a count 17,594 records. Now try this on for size:
SELECT COUNT(1) FROM rv_emails WHERE rv_email LIKE REVERSE('%msn.com');
How long did it take for you? A little better, eh? (That's my Canadian vernacular coming through) For me, it was .03 sec to achieve the identical result. An improvement of about 40 times.

Just for fun, here are the results when the table has 5 million records:
  • Old method: 6.69 sec
  • New method: 0.19 sec
Fun stuff!

Until next time,

No comments: