03 June 2011

How to Parse the Most Selected Tables From MySQL General Query Log in One Line

Problem: Pin down redundant queries by finding out which tables are being selected from the most.

I like general query log parsing tools, such as mk-query-digest or mysqlsla.

However, there are times when I am on a server as a dba with limited access and can only read files in /tmp.  And other times I just need to get something done quickly.

So what did I do to find out the most selected tables from a database?

1. Make sure /tmp has enough space to do what I'm about to do
2. Turn on the general query log with the file at /tmp/mysql_general.log.  This is usually /var/log/mysqllog/mysql_general or similar, but I need access to this log without waking up the sysadmin.
3. Turn off the general query log before /tmp fills up (for some people this might be just minutes)
4. Parse using this nifty set of commands:
grep -i "SELECT " /tmp/mysql_general.log | grep -io "SELECT .*" | sed 's|\(FROM [^ ]*\) .*|\1|' | sort | uniq -c | sort -nr | head -100

That's all!

As my mother would say, "Try it... You'll like it."


shantanu said...

I tried and liked it!

Joel Hanger said...

U can also use a table for general log, I typically keep both and rotate the log file and purge the general log after doing some processing on it.

note I also re-created the log table and set it as a MyISAM table and added keys (must turn off locking first) so that queries can be performed much quicker, you also cannot perform locking on log tables.

mysql> show variables like '%log%';
| log_output | FILE,TABLE |

mysql> describe general_log;
| Field | Type | Null | Key | Default | Extra |
| event_time | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| user_host | mediumtext | NO | MUL | NULL | |
| thread_id | int(11) | NO | | NULL | |
| server_id | int(10) unsigned | NO | | NULL | |
| command_type | varchar(64) | NO | MUL | NULL | |
| argument | mediumtext | NO | | NULL | |

I don't have time right now but, a query could be easily written to parse out the tables and count them, and with an index on the it could easily be much faster than using grep...

Unknown said...

Thank you so much! This saved me an immeasurable amount of time.