Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does a loop take longer to execute each time?

I'm working on some J2EE project which involves storing postal codes, cities and countries together. We have developed a Java class which handles the integration of every country file (containing each postal code and each city). The problem is that for some countries (Great Britain, Netherlands...), the file is pretty big (400.000 to 800.000 lines).

I've got a while() loop which reads the next line, gets the information and stores it into my database. The problem is that for the 1000 or 10.000 first lines, the process is fast, really fast, then seems to be slowing each time it goes through the loop, then happens to throw a HeapSpaceOverflowException after 150.000 lines.

I thought first that some objects weren't garbage collected and slowed down my algorithm, but I can't figure out which one. Besides, when I run this algorithm on my PC, JConsole tells me that heap space is regularly cleaned (seems to be garbage collected), but the process is still slower and slower.

Below is the code of the method:

FileReader fr = new FileReader(nomFichier);
BufferedReader br = new BufferedReader(fr);
    
int index = 0; String ligne; String codePostal; String nomVille; 
String codePays; PPays pays; String[] colonnes;
    
while ((ligne = br.readLine()) != null)
{
    System.out.println("line "+ ++index);
        
    colonnes = ligne.split(Pattern.quote(";"));
        
    codePostal = colonnes[9];
    nomVille   = colonnes[8];
    codePays   = colonnes[0];
        
    pays = this.pc.getByCodePays(codePays);
        
    this.pc.getByCodePostalAndVilleAndINSEE(codePostal, nomVille, pays.getNomPays(), "");
}

Variable this.pc is injected through @Inject annotation.

Can someone help me to figure out why this code gets slower and slower?

For completeness sake, I've added the code of the get...() method:

public Codepostalville getByCodePostalAndVilleAndINSEE(String codePostal, String ville, 
                                                       String pays, String codeINSEE) throws DatabaseException
{
    Codepostal cp = null; Ville v = null; PPays p = null; Codepostalville cpv = null;
    
    try
    {
        // Tout d'abord, il faut retrouver l'objet CodePostal
        cp = (Codepostal) this.em
                        .createNamedQuery("Codepostal.findByCodePostal")
                        .setParameter("codePostal", codePostal)
                        .getSingleResult();
    }
    catch (NoResultException nre1)
    {
        // Si on ne l'a pas trouvé, on le crée
        if (cp == null)
        {
            cp = new Codepostal();
            cp.setCodePostal(codePostal);
            cpc.getFacade().create(cp);
        } 
    }
    
    // On retrouve la ville...
    try
    {
        // Le nom de la ville passé par l'utilisateur doit être purgé (enlever
        // les éventuels tirets, caractères spéciaux...)
        // On crée donc un nouvel objet Ville, auquel on affecte le nom à purger
        // On effectue la purge, et on récupère le nom purgé
        Ville purge = new Ville();
        purge.setNomVille(ville);
        purge.purgerNomVille();
        ville = purge.getNomVille();
        
        v = (Ville) this.em
                        .createNamedQuery("Ville.findByNomVille")
                        .setParameter("nomVille", ville)
                        .getSingleResult();
    }
    catch (NoResultException nre2)
    {
        // ... ou on la crée si elle n'existe pas
        if (v == null)
        {
            v = new Ville();
            v.setNomVille(ville);
            vc.getFacade().create(v);
        }
    }
    
    // On retrouve le pays
    try
    {
        p = (PPays) this.em
                        .createNamedQuery("PPays.findByNomPays")
                        .setParameter("nomPays", pays)
                        .getSingleResult();
    }
    catch (NoResultException nre2)
    {
        // ... ou on la crée si elle n'existe pas
        if (p == null)
        {
            p = new PPays();
            p.setNomPays(pays);
            pc.getFacade().create(p);
        }
    }
        
    // Et on retrouve l'objet CodePostalVille
    try
    {
        cpv = (Codepostalville) this.em
                .createNamedQuery("Codepostalville.findByIdVilleAndIdCodePostalAndIdPays")
                .setParameter("idVille", v)
                .setParameter("idCodePostal", cp)
                .setParameter("idPays", p)
                .getSingleResult();
        
        // Si on a trouvé l'objet CodePostalVille, on met à jour son code INSEE
        cpv.setCodeINSEE(codeINSEE);
        this.getFacade().edit(cpv);
    }
    catch (NoResultException nre3)
    {         
        if (cpv == null)
        {
            cpv = new Codepostalville();
            cpv.setIdCodePostal(cp);
            cpv.setIdVille(v);
            cpv.setCodeINSEE(codeINSEE);
            cpv.setIdPays(p);
            this.getFacade().create(cpv);
        }
    }
    
    return cpv;
}

So, I have some more information. The getCodePostal...() method needs around 15ms to be executed at the very beginning of the loop, and after 10.000 lines, it needs more than 100ms to be executed (almost 10 times more!). In this new version I have disabled the commit/rollback code, so each query is committed on the fly.

I can't really find why it needs more and more time.

I've tried to search for some information about JPA's cache : My current configuration is this (in persistence.xml) :

<property name="eclipselink.jdbc.bind-parameters" value="true"/>
<property name="eclipselink.jdbc.cache-statements" value="true"/>
<property name="eclipselink.cache.size.default" value="10000"/>
<property name="eclipselink.query-results-cache" value="true"/>

I don't know if it is the most efficient configuration, and I would appreciate some help and some explanations about JPA's cache.

like image 495
Adrien Dos Reis Avatar asked Jul 24 '14 12:07

Adrien Dos Reis


1 Answers

You might want to read up on JPA concepts. In brief, an EntityManager is associated with a persistence context, which keeps a reference to all persistent objects manipulated through it, so it can write any changes done to these objects back to the database.

Since you never close the persistence context, that's the likely cause of your memory leak. Moreover, a persistence provider must write changes to persistent objects to the database prior to issuing a query, if these changes might alter the result of the query. To detect these changes requires iteration over all objects associated with the current persistent context. In your code, that's nearly a million objects for every query you issue.

Therefore, at the very least, you should clear the persistence context in regular intervals (say every 1000 rows).

It's also worth noting that unless your database is on the same server, every query you issue must travel over the network to the database, and the result back to the application server, before your program can continue. Depending on network latency, this can easily take a milli second each time - and you are doing this several million times. If it needs to be truly efficient, loading the entire table into memory, and performing the checks for existence there, might be substantially faster.

like image 61
meriton Avatar answered Oct 06 '22 11:10

meriton