02 November 2007

How to Select Unique Records from a table that does not have a Unique Index

Many times DBAs work with data sets that are not the most optimal (in their opinion), and I recently had the pleasure of working with a table that did not have a unique identifier. And I needed to get unique results from this table, as I would need to do this query in different forms in the future, without selecting records from the past.

Caveat: I could not alter the table structure yet; I needed to make do with the way it was (there was a LOAD INFILE script or two without qualified field names). The first thing I would have done is simply add an auto_increment column at the start.

I also could not add a unique index; there was no time for that -- it would have easily taken over an hour.


The Challenge

Obtain exactly 10,000 records with a unique identifier x nine different queries.


The Half-Solution

Create table_2 with a unique constraint based on the id.

INSERT IGNORE INTO table_2 (id, a,b,c) SELECT id, a,b,c FROM table_1 WHERE column_condition = 'match this' LIMIT 10000;


The Problem

Affected rows are less than 10000, thus falling short of the 10,000 goal.

I needed to do nine of these, and there was no way I was going to do this by hand in sql.


The Solution


Use a php loop and the use of mysql_affected_rows to repeat and rinse until the desired solution is achieved.

$finished = false;
$limit = 10000;

while (!($finished)) {

$sql = "INSERT IGNORE INTO table_2 (id, a,b,c) SELECT id, a,b,c FROM table_1 WHERE column_condition = 'match this' LIMIT $limit;";
mysql_query($sql,$link) or die(mysql_error());
$affected_rows = mysql_affected_rows();
if ($limit == $affected_rows) {
$finished = true;
} else {
$limit = $limit - $affected_rows # rinse and repeat!
}
}


This saved me a lot of grief until I have the window of opportunity to put in a unique index on the table.

Until next time,
David


No comments: