SELECT COUNT(1) FROM rv_emails WHERE rv_email LIKE REVERSE('%msn.com');I was asked to do a count of the top 50 domains in our table of email addresses.
So, I simply changed my previous route by performing the following steps:
1) Add a 'domain' column to my table:
ALTER emails ADD COLUMN domain CHAR(30) NOT NULL;2) Populate the domain column:
UPDATE emails SET DOMAIN = RIGHT(email, LENGTH(email) - INSTR(email,'@'));3) Select the top 50 domains by doing the following:
SELECT domain, COUNT(1) FROM emails GROUP BY domain ORDER BY COUNT(1) DESC LIMIT 50;There are a couple of items to note here:
- No need to use the word DISTINCT as this is implied by GROUP BY;
- ORDER BY COUNT(1) DESC does add "Using temporary; Using filesort" to the EXPLAIN statement, but on a 5 million record table, this took 7.78 sec. Another way to do this is to add "HAVING COUNT(1) > 5000" or some other arbitrary number and write these to a temp table for sorting the final results
Until next time,
David
No comments:
Post a Comment