Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Serious, intermittent errors with CF Web Service

We've got an incredibly frustrating situation with a CF Web Services-based API that we wrote and maintain. We had an API in place for years that was stable and working happily with Ruby, PHP, and ColdFusion clients. Then this year a .NET client came along, and we found that our web service was not interoperable with statically-typed languages due to our extensive use of structs.

We eventually realized we had to re-write the API without structs, and we've done so. It now uses scaler values, arrays, and CFCs (which get translated to SOAP complexTypes). The .NET client is happy, and we wrote proof-of-concept clients in about 6 different languages to ensure that we'd be interoperable this time around.

To our great dismay, it appears that our ColdFusion 7 servers can't serve the new API reliably. It works for about a day or so after restarting, then the clients start getting errors like:

Error: coldfusion.xml.rpc.CFCInvocationException [java.lang.ClassNotFoundException : tafkan.remote_api.pfapi.v.trunk.rsp_pf_survey_status_array]

and

java.lang.NoClassDefFoundError: tafkan/remote_api/pfapi/v/trunk/pf_unit

Restarting the CF instances is the only way to make the problem go away. A lot of time and money was put into rebuilding the API, so everyone is really at wit's end about this.

We've noticed that the WEB-INF/cfc-skeletons directories of our CF instances eventually seem to have two copies of the classes for each of the CFCs used by the API. For example:

-rw-r--r--  Feb 17 09:15 remote_api.pfapi.v.trunk.pf_datum.class
-rw-r--r--  Feb  3 12:20 tafkan.remote_api.pfapi.v.trunk.pf_datum.class

It seems like the errors are coming from a namespace or class search path problem, so we tried switching all CFC references to be fully-qualified (dot notation starting with a mapping) instead of just simple references to CFCs in the current directory. This seemed promising, but the problem came back within 24 hours.

Environment:

  • ColdFusion 7,0,2,142559 with hf702-70523, 2-instance cluster
  • Sun Java 1.4.2_13
  • Apache 2.0.52
  • Centos 4.5 32-bit

Maybe upgrading one of these venerable pieces of software would help? Maybe upgrading just AXIS?

Adobe support doesn't seem to be an option, as CF7 is EOL'ed and in extended-extended support (and that just for a few more days).

Update:

Thanks to all who've joined this discussion! Here's an update on where things stand at the moment.

The service just crapped out for the first time today. One of the cluster instances was still able to generate the WSDL, while the other instance said:

AXIS error
Sorry, something seems to have gone wrong... here are the details:
Exception - java.lang.NoClassDefFoundError: tafkan/remote_api/pfapi/v/trunk/rsp_pf_numeric_array

Both cfc-skeletons directories contain a file called tafkan.remote_api.pfapi.v.trunk.rsp_pf_numeric_array.class, and did not appear to contain the otherly-named files we've sometimes seen (remote_api.pfapi.v.trunk.rsp_pf_numeric_array.class). The files in cfc-skeletons do not appear to have been modified since the servers were started yesterday.

The uptime on both instances was about 21.5 hours. I was running without JIT (-Xint).

I've now restarted both instances. They're now running on Sun Java 1.4.2_19 (instead of _13), and JIT has been re-enabled as it clearly wasn't causing this error and was things were dramatically slower without it. I've also cleared the "save class files" check boxes.

And now, we wait again...

Update 2 The problem persists. I'm not sure what else to try at this point. Arg!

FYI, this is cross-posted at http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:60922

like image 852
sbleon Avatar asked Feb 20 '10 02:02

sbleon


1 Answers

I've read this thread, and the CFTalk thread. My initial thoughts about workarounds appear to have been already suggested by Mark Kruger and Dave Watts. The only other workaround idea I had was to catch the error and refresh the webservice stub using the Service Factory methods. (In CF8-9 there is a Admin API method to do this, not sure about CF7).

Researching the error I narrowed down possible matches to these:

http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:144821 This was a match but unresolved

http://blog.coldfusionpowered.com/?p=28 This was a very similar error, resolved by "fixing case issues" on all CFCs & invocations.

ColdFusion Google Adwords Business Component Error Resolved by rewriting code and removing cfcomments (I suspect that other factors were actually responsible for solving it here)

http://forums.crystaltech.com/index.php?topic=22364.0 We're getting closer now. Resolution involved mistakenly having two document roots

http://qaix.com/coldfusion/313-410-web-service-on-cfmx-6-1-jrun-suddenly-not-working-read.shtml Exact match for error message. Exact match for having CFC mapping to doc root. Resolution was to have only 1 mapping pointing to docroot, just "/". This could be the solution. In MX 6/6.1 and maybe 7, there was a default mapping for "/" pointing to docroot. If you have another mapping pointing to docroot, then I can see how this problem might arise. Check the physical paths for mappings and try the solution here, to use only the "/" mapping.

like image 65
Steven Erat Avatar answered Sep 30 '22 06:09

Steven Erat