I am trying to setup a simple application using spring integration. The goal is to simply use a file inbound channel adapter to monitor a directory for new files and process files as they are added. For simplicity the processing the files at the moment is simply logging some output (name of file being processed). I do however want to process files in a multithreaded fashion. So lets say 10 files are picked up and should be processed in parallel and once these are completed only then we move on to the next 10 files.
For that I tried two different approaches and both seem to work similarly and I wanted to understand the differences between using poller or dispatcher for something like this.
Approach #1 - Using poller
<int-file:inbound-channel-adapter id="filesIn" directory="in">
<int:poller fixed-rate="1" task-executor="executor" />
</int-file:inbound-channel-adapter>
<int:service-activator ref="moveToStage" method="move" input-channel="filesIn" />
<task:executor id="executor" pool-size="5" queue-capacity="0" rejection-policy="DISCARD" />
So here the idea as I understand is that we are constantly polling the directory and as soon as a file is received its sent to filesIn channel until the pool limit is reached. Then until the pool is occupied no additional files are sent even though im assuming the polling still continues in the background. This seems to work but I am not sure if using the max messages per poll can be helpful here to decrease the polling frequency. By setting the max messages per poll close to pool size.
Approach #2 - Using dispatcher
<int-file:inbound-channel-adapter id="filesIn" directory="in">
<int:poller fixed-rate="5000" max-messages-per-poll="3" />
</int-file:inbound-channel-adapter>
<int:bridge input-channel="filesIn" output-channel="filesReady" />
<int:channel id="filesReady">
<int:dispatcher task-executor="executor"/>
</int:channel>
<int:service-activator ref="moveToStage" method="move" input-channel="filesInReady" />
<task:executor id="executor" pool-size="5" queue-capacity="0" rejection-policy="CALLER_RUNS" />
okay so here the poller is not using the executor so I am assuming its polling in a sequential fashion. Every poll 3 files should be picked up and then sent to filesReady channel which then uses the dispatcher to pass the files on to the service activator and because it uses the executor for dispatcher it immediately returns control and allows the filesIn channel to send more files.
I guess my question is am I understanding both approaches correctly and if one is better than other.
Thanks
PollableChannel interface (such as a QueueChannel ) produces an instance of PollingConsumer . Polling consumers let Spring Integration components actively poll for Messages rather than process messages in an event-driven manner. They represent a critical cross-cutting concern in many messaging scenarios.
A Poller is a piece of software that sends a periodic request to an agent for management data. For example, the poller sends a message to a router agent asking it to send back particular variables, and then the agent sends the variables back to the poller.
The Poller class demonstration code provides a means of accessing the functionality of the C poll(2) API. It attempts to mirror the C poll(2) API only as much as is possible while allowing for optimal performance.
SubscribableChannel. The SubscribableChannel base interface is implemented by channels that send messages directly to their subscribed MessageHandler instances. Therefore, they do not provide receive methods for polling. Instead, they define methods for managing those subscribers.
Yes, your understanding is correct.
Generally, I would say that polling every millisecond (and discarding the poll when the queue is full) is a waste of resources (CPU and I/O).
Also, increasing the max messages per poll in the first case won't help because the poll is done on the executor thread (the scheduler hands off the poll to the executor and that thread will handle the mmpp
).
In the second case, since the scheduler thread hands off during the poll (rather than before it), the mmpp
will work as expected.
So, in general, your second implementation is best (as long as you can live with an average 2.5 second delay when a new file(s) arrives).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With