Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Reference the External Jar in Flink

Tags:

apache-flink

everyone. I tried to reference my company jar in Flink in the way of copying it to $FLINK/lib in all of taskmanagers, but failed. And I don't want to package a fat jar, which is too heavy and waste of time. I think the first method is also not a good idea, because I have to manager jars in the whole cluster. Anyone kowns how to resolve this problem? Any suggestion would be appreciated.

like image 245
zhangshengxiong Avatar asked Aug 03 '15 09:08

zhangshengxiong


People also ask

How do I add an external jar to SQL Developer?

You can do this manually or using Check for Updates. To load a third party driver, go to Tools > Preferences > Database > Third Party Drivers. Click Add Entry and add your specific jar file. The jars on this preference panel are used for all third party databases.

Why can I not add external JARs in eclipse?

If your project builds with Java 9+ make sure you've selected Classpath (as shown here). You should also be able to add the JAR with right-click on the project > Build Path > Add External Archives....


2 Answers

In general, building a fat jar is the best way to go. Not sure how big your far jar gets, that you thinks it is "too heavy"?

Copying jars to $FLINK/lib should work. However, you need to restart Flink such that the jars are added to Flink's classpath. Thus, this approach does not allow to dynamically add jars -- it should work for a bunch of stable jars however.

In order to manage jars in the whole cluster, it might be helpful to use a NFS folder as $FLINK/lib to keep all TaskManagers in sync. Or you simple write a bash script to distribute your jars.

like image 171
Matthias J. Sax Avatar answered Oct 01 '22 15:10

Matthias J. Sax


Flink's Command Line Interface (CLI) allows passing additional jar location paths using the -C option. We use it to pass dependencies to each job.

Our problem: Given that usually our jobs evolve during the whole project lifetime and that their external dependencies change their versions and that we run several processes in the same cluster, we wanted to select the exact jar versions to load in each run. Therefore, the $FLINK/lib directory was not enough for us.

Details: What we do is to distribute the jars to a fixed directory (different from $FLINK/lib) on every node. Later we use the CLI to start the job (not directly as the call is quite long, but using a bash script to abbreviate the call).

like image 37
user2108278 Avatar answered Oct 01 '22 13:10

user2108278