Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I speed up batch processing job in Coldfusion?

Every once in awhile I am fed a large data file that my client uploads and that needs to be processed through CMFL. The problem is that if I put the processing on a CF page, then it runs into a timeout issue after 120 seconds. I was able to move the processing code to a CFC where it seems to not have the timeout issue. However, sometime during the processing, it causes ColdFusion to crash and has to restarted. There are a number of database queries (5 or more, mixture of updates and selects) required for each line (8,000+) of the file I go through as well as other logic provided by me in the form of CFML.

My question is what would be the best way to go through this file. One caveat, I am not able to move the file to the database server and process it entirely with the DB. However, would it be more efficient to pass each line to a stored procedure that took care of everything? It would still be a lot of calls to the database, but nothing compared to what I have now. Also, what would be the best way to provide feedback to the user about how much of the file has been processed?

Edit: I'm running CF 6.1

like image 247
Jason Avatar asked Dec 22 '22 10:12

Jason


2 Answers

I just did a similar thing and use CF often for data parsing.

1) Maintain a file upload table (Parent table). For every file you upload you should be able to keep a list of each file and what status it is in (uploaded, processed, unprocessed)

2) Temp table to store all the rows of the data file. (child table) Import the entire data file into a temporary table. Attempting to do it all in memory will inevitably lead to some errors. Each row in this table will link to a file upload table entry above.

3) Maintain a processing status - For each row of the datafile you bring in, set a "process/unprocessed" tag. This way if it breaks, you can start from where you left off. As you run through each line, set it to be "processed".

4) Transaction - use cftransaction if possible to commit all of it at once, or at least one line at a time (with your 5 queries). That way if something goes boom, you don't have one row of data that is half computed/processed/updated/tested.

5) Once you're done processing, set the file name entry in the table in step 1 to be "processed"

By using the approach above, if something fails, you can set it to start where it left off, or at least have a clearer path of where to start investigating, or worst case clean up in your data. You will have a clear way of displaying to the user the status of the current upload processing, where it's at, and where it left off if there was an error.

If you have any questions, let me know.

Other thoughts:

  1. You can increase timeouts, give the VM more memory, put it in 64 bit but all of those will only increase the capacity of your system so much. It's a good idea to do these per call and do it in conjunction with the above.

  2. Java has some neat file processing libraries that are available as CFCS. if you run into a lot of issues with speed, you can use one of those to read it into a variable and then into the database

  3. If you are playing with XML, do not use coldfusion's xml parsing. It works well for smaller files and has fits when things get bigger. There are several cfc's written out there (check riaforge, etc) that wrap some excellent java libraries for parsing xml data. You can then create a cfquery manually if need be with this data.

like image 146
Jas Panesar Avatar answered Jan 15 '23 12:01

Jas Panesar


It's hard to tell without more info, but from what you have said I shoot out three ideas.

The first thing, is with so many database operations, it's possible that you are generating too much debugging. Make sure that under Debug Output settings in the administrator that the following settings are turned off.

  • Enable Robust Exception Information
  • Enable AJAX Debug Log Window
  • Request Debugging Output

The second thing I would do is look at those DB queries and make sure they are optimized. Make sure selects are happening with indicies, etc.

The third thing I would suspect is that the file hanging out in memory is probably suboptimal.

I would try looping through the file using file looping:

<cfloop file="#VARIABLES.filePath#" index="VARIABLES.line">
    <!--- Code to go here --->
</cfloop>
like image 38
Terry Ryan Avatar answered Jan 15 '23 10:01

Terry Ryan