04 September 2008

Case Sensitive Fields

Yes, it has been too long since I last posted on this blog (3 months) - that's my apology; let's move forward.

There's too many interesting MySQL bits and pieces that I still come across my desk that I cannot help but post these fantastic learning opportunities to my blog. Obviously, you can read about all things interesting from PlanetMySql.org, but I hope that my blog will help sift people through specific issues that pertain directly to them while working with MySQL.

Oh yes... I am now located in Denver and I work for a big startup that uses MySQL for data mining. Most excellent!

Querying Case Sensitive Fields

Issue: Users to be able to query for case sensitive field values. For example, 700g is not the same as 700G.

Answer: COLLATE latin1_general_cs

Example in action:
mysql> CREATE TABLE test (non_case_sensitive char(1) NOT NULL, case_sensitive char(1) COLLATE latin1_general_cs) ENGINE=MyISAM charset=latin1;

mysql> insert into test values ('x','x');
mysql> insert into test values ('X','X');
mysql> insert into test values ('y','Y');
mysql> insert into test values ('Y','y');
mysql> select non_case_sensitive, count(*) from test group by non_case_sensitive;
+--------------------+----------+
| non_case_sensitive | count(*) |
+--------------------+----------+
| x | 2 |
| y | 2 |
+--------------------+----------+
2 rows in set (0.00 sec)

mysql> select case_sensitive, count(*) from test group by case_sensitive;
+----------------+----------+
| case_sensitive | count(*) |
+----------------+----------+
| X | 1 |
| x | 1 |
| Y | 1 |
| y | 1 |
+----------------+----------+
4 rows in set (0.00 sec)

Voila!

Performance Considerations

There does not appear to be much of a performance hit as a result of this change. On a 4.5 million record table, there was no difference. In fact many of our comparison queries (with same number of results) on the case sensitive table were a bit faster than on the case insensitive table.

Have fun! I hope to post more frequently in the future as I continue to find out interesting tidbits along the way.

Until next time,
David

1 comment:

Justin Swanhart said...

Note that you don't have to change the table collation at the column level in order to take advantage of case insensitive search.

For instance you could do something like the following:

mysql> select * from t1 where c2='A';
+----+------+
| c1 | c2 |
+----+------+
| 0 | a |
+----+------+
1 row in set (0.32 sec)

mysql> select * from t1 where c2 collate latin1_bin = 'A';
Empty set (0.27 sec)

mysql> select * from t1 where c2 collate latin1_bin = 'a';
+----+------+
| c1 | c2 |
+----+------+
| 0 | a |
+----+------+
1 row in set (0.15 sec)