03 June 2011

How to Parse the Most Selected Tables From MySQL General Query Log in One Line

Problem: Pin down redundant queries by finding out which tables are being selected from the most.

I like general query log parsing tools, such as mk-query-digest or mysqlsla.

However, there are times when I am on a server as a dba with limited access and can only read files in /tmp.  And other times I just need to get something done quickly.

So what did I do to find out the most selected tables from a database?

1. Make sure /tmp has enough space to do what I'm about to do
2. Turn on the general query log with the file at /tmp/mysql_general.log.  This is usually /var/log/mysqllog/mysql_general or similar, but I need access to this log without waking up the sysadmin.
3. Turn off the general query log before /tmp fills up (for some people this might be just minutes)
4. Parse using this nifty set of commands:
grep -i "SELECT " /tmp/mysql_general.log | grep -io "SELECT .*" | sed 's|\(FROM [^ ]*\) .*|\1|' | sort | uniq -c | sort -nr | head -100

That's all!

As my mother would say, "Try it... You'll like it."


shantanu said...

I tried and liked it!

Joel Hanger said...

U can also use a table for general log, I typically keep both and rotate the log file and purge the general log after doing some processing on it.

note I also re-created the log table and set it as a MyISAM table and added keys (must turn off locking first) so that queries can be performed much quicker, you also cannot perform locking on log tables.

mysql> show variables like '%log%';
| log_output | FILE,TABLE |

mysql> describe general_log;
| Field | Type | Null | Key | Default | Extra |
| event_time | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| user_host | mediumtext | NO | MUL | NULL | |
| thread_id | int(11) | NO | | NULL | |
| server_id | int(10) unsigned | NO | | NULL | |
| command_type | varchar(64) | NO | MUL | NULL | |
| argument | mediumtext | NO | | NULL | |

I don't have time right now but, a query could be easily written to parse out the tables and count them, and with an index on the it could easily be much faster than using grep...