I need to update every row of a table having more then 60k rows. Currently I'm doing it like this:
public void updateRank(Map<Integer, Double> map) {
Iterator<Map.Entry<Integer, Double>> it = map.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<Integer, Double> pairs = (Map.Entry<Integer, Double>) it.next();
String query = "update profile set rank = " + pairs.getValue()
+ " where profileId = " + pairs.getKey();
DBUtil.update(query);
it.remove();
}
}
This method alone took around 20+ mins to complete, hitting the database for each row(60k) is what i think the reason here.(though i'm using dbcp for connecton pooling, with 50 maximum active connections)
It'd be great if i'd be able to update rows with single database hit. Is that Possible ? How ?
Or any other way to improve timing here ?
Column values on multiple rows can be updated in a single UPDATE statement if the condition specified in WHERE clause matches multiple rows. In this case, the SET clause will be applied to all the matched rows.
First, specify the table name that you want to change data in the UPDATE clause. Second, assign a new value for the column that you want to update. In case you want to update data in multiple columns, each column = value pair is separated by a comma (,). Third, specify which rows you want to update in the WHERE clause.
We can update multiple columns by specifying multiple columns after the SET command in the UPDATE statement. The UPDATE statement is always followed by the SET command, it specifies the column where the update is required.
If every row should get a different value that cannot be derived from the existing data in the database, there is not much you can do to optimize the overall complexity. So do not expect too much wonders.
That said, you should start using prepared statements and batching:
public void updateRank(Map<Integer,Double> map){
Iterator<Entry<Integer, Double>> it = map.entrySet().iterator();
String query = "";
int i = 0;
Connection connection = getConnection(); // get the DB connection from somewhere
PreparedStatement stmt = connection.prepareStatement("update profile set rank = ? where profileId = ?");
while (it.hasNext()) {
Map.Entry<Integer,Double> pairs = (Map.Entry<Integer,Double>)it.next();
stmt.setInt(1, pairs.getValue());
stmt.setDouble(2, pairs.getKey());
stmt.addBatch(); // this will just collect the data values
it.remove();
}
stmt.executeBatch(); // this will actually execute the updates all in one
}
What this does:
In addition:
profileId
is using an index so that looking up the respective row is fast enoughRight now you execute each query independently which causes a huge connection overhead (even when using connection pooling). Instead use a batch mechanism to execute several queries together.
Using JDBC (which DBCP apparently is using) and prepared statements, you can do this very easily by using addBatch()
and executeBatch()
. I recently had to do this my own and a batch size of around 1000 queries was the fastest. Though this may be entirely different in your case.
References
You could concatenate your queries (separate them by a ;
) and send only batches of 100 queries for example:
public void updateRank(Map<Integer, Double> map) {
Iterator<Map.Entry<Integer, Double>> it = map.entrySet().iterator();
String queries = "";
int i = 0;
while (it.hasNext()) {
Map.Entry<Integer, Double> pairs =
(Map.Entry<Integer, Double>) it.next();
queries += "update profile set rank = " + pairs.getValue()
+ " where profileId = " + pairs.getKey() + ";";
it.remove();
if (i++ % 100 == 99) {
DBUtil.update(queries);
queries = "";
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With