Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JDBC optimize MySql request on Multithread

I'm building a webcrawler and I'm looking for the best way to handle my requests and connection between my threads and the database (MySql).

I've 2 types of threads :

  1. Fetchers : They crawl websites. They produce url and add they into 2 tables : table_url and table_file. They select from table_url to continue the crawl. And update table_url to set visited=1 when they have read a url. Or visited=-1 when they are reading it. They can delete row.
  2. Downloaders : They download files. They select from table_file. They update table_file to change the Downloaded column. They never insert anything.

Right now I'm working with this : I've a pool of connection based on c3p0. Every target (website) have thoses variables :

private Connection connection_downloader;
private Connection connection_fetcher;

I create both connection only once when I instanciate a website. Then every thread will use thoses connections based on their target.

Every thread have thoses variables :

private Statement statement;
private ResultSet resultSet;

Before every Query I open a SqlStatement :

public static Statement openSqlStatement(Connection connection){
    try {
        return connection.createStatement();
    } catch (SQLException e) {
        e.printStackTrace();
    }
    return null;
}

And after every Query I close sql statement and resultSet with :

public static  void closeSqlStatement(ResultSet resultSet, Statement statement){
    if (resultSet != null) try { resultSet.close(); } catch (SQLException e) {e.printStackTrace();}
    if (statement != null) try { statement.close(); } catch (SQLException e) {e.printStackTrace();}
}

Right now my Select queries only work with one select (I never have to select more than one for now but this will change soon) and is defined like this :

public static  String sqlSelect(String Query, Connection connection, Statement statement, ResultSet resultSet){
    String result = null;
    try {
        resultSet = statement.executeQuery(Query);
        resultSet.next();
        result = resultSet.toString();
    } catch (SQLException e) {
        e.printStackTrace();
    }
    closeSqlStatement(resultSet, statement);
    return result;
}

And Insert, Delete and Update queries use this function :

public static int sqlExec(String Query, Connection connection, Statement statement){
    int ResultSet = -1;
    try {
        ResultSet = statement.executeUpdate(Query);
    } catch (SQLException e) {
        e.printStackTrace();
    }
    closeSqlStatement(resultSet, statement);
    return ResultSet;
}

My question is simple : can this be improved to be faster ? And I'm concerned about mutual exclusion to prevent a thread to update a link while another is doing it.

like image 306
naurel Avatar asked Jan 27 '26 00:01

naurel


1 Answers

I believe your design is flawed. Having one connection assigned full-time for one website will severly limit your overall workload.

As you already have setup a connection pool, it's perfectly okay to fetch before you use (and return afterwards).

Just the same, try-with-catch for closing all your ResultSets and Statements after will make code more readable - and using PreparedStatement instead of Statement would not hurt as well.

One Example (using a static dataSource() call to access your pool):

public static String sqlSelect(String id) throws SQLException {
    try(Connection con = dataSource().getConnection();
        PreparedStatement ps = con.prepareStatement("SELECT row FROM table WHERE key = ?")) {
          ps.setString(1, id);
          try(ResultSet resultSet = ps.executeQuery()) {
            if(rs.next()) {
              return rs.getString(1);
            } else {
              throw new SQLException("Nothing found");
            }
          }
    } catch (SQLException e) {
        e.printStackTrace();
        throw e;
    }
}

Following the same pattern I suggest you create methods for all the different Insert/Update/Selects your application uses as well - all using the connection only for the short time inside the DB logic.

like image 175
Jan Avatar answered Jan 28 '26 13:01

Jan