Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Want to run non-threadsafe library in parallel - can it be done using multiple classloaders?

I work on a project where we use a library that is not guaranteed thread-safe (and isn't) and single-threaded in a Java 8 streams scenario, which works as expected.

We would like to use parallel streams to get the low hanging scalability fruit.

Unfortunately this cause the library to fail - most likely because one instance interferes with variables shared with the other instance - hence we need isolation.

I was considering using a separate classloader for each instance (possibly thread local) which to my knowledge should mean that for all practical purposes that I get the isolation needed but I am unfamiliar with deliberately constructing classloaders for this purpose.

Is this the right approach? How shall I do this in order to have proper production quality?


Edit: I was asked for additional information about the situation triggering the question, in order to understand it better. The question is still about the general situation, not fixing the library.

I have full control over the object created by the library (which is https://github.com/veraPDF/) as pulled in by

<dependency>     <groupId>org.verapdf</groupId>     <artifactId>validation-model</artifactId>     <version>1.1.6</version> </dependency> 

using the project maven repository for artifacts.

<repositories>     <repository>         <snapshots>             <enabled>true</enabled>         </snapshots>         <id>vera-dev</id>         <name>Vera development</name>         <url>http://artifactory.openpreservation.org/artifactory/vera-dev</url>     </repository> </repositories> 

For now it is unfeasible to harden the library.


EDIT: I was asked to show code. Our core adapter is roughly:

public class VeraPDFValidator implements Function<InputStream, byte[]> {     private String flavorId;     private Boolean prettyXml;      public VeraPDFValidator(String flavorId, Boolean prettyXml) {         this.flavorId = flavorId;         this.prettyXml = prettyXml;         VeraGreenfieldFoundryProvider.initialise();     }      @Override     public byte[] apply(InputStream inputStream) {         try {             return apply0(inputStream);         } catch (RuntimeException e) {             throw e;         } catch (ModelParsingException | ValidationException | JAXBException | EncryptedPdfException e) {             throw new RuntimeException("invoking VeraPDF validation", e);         }     }      private byte[] apply0(InputStream inputStream) throws ModelParsingException, ValidationException, JAXBException, EncryptedPdfException {         PDFAFlavour flavour = PDFAFlavour.byFlavourId(flavorId);         PDFAValidator validator = Foundries.defaultInstance().createValidator(flavour, false);         PDFAParser loader = Foundries.defaultInstance().createParser(inputStream, flavour);         ValidationResult result = validator.validate(loader);          // do in-memory generation of XML byte array - as we need to pass it to Fedora we need it to fit in memory anyway.          ByteArrayOutputStream baos = new ByteArrayOutputStream();         XmlSerialiser.toXml(result, baos, prettyXml, false);         final byte[] byteArray = baos.toByteArray();         return byteArray;     } } 

which is a function that maps from an InputStream (providing a PDF-file) to a byte array (representing the XML report output).

(Seeing the code, I've noticed that there is a call to the initializer in the constructor, which may be the culprit here in my particular case. I'd still like a solution to the generic problem.

like image 852
Thorbjørn Ravn Andersen Avatar asked Jan 30 '17 13:01

Thorbjørn Ravn Andersen


People also ask

What does it mean that the library is not thread-safe and what is true about the Android library?

"Not thread safe" means its internal representation doesn't (properly) handle access from multiple threads. If only a single thread actually uses the library, thread safety isn't an issue.

What is concurrency safe?

A code that is safe to call by multiple threads simultaneously is called thread-safe. If a piece of code is thread-safe, then it contains no race conditions. Race conditions only occur when multiple threads update shared resources.

What does Threadsafe mean?

Thread safety is a computer programming concept applicable to multi-threaded code. Thread-safe code only manipulates shared data structures in a manner that ensures that all threads behave properly and fulfill their design specifications without unintended interaction.


1 Answers

We have faced similar challenges. Issues usually came from from static properties which became unwillingly "shared" between the various threads.

Using different classloaders worked for us as long as we could guarantee that the static properties were actually set on classes loaded by our class loader. Java may have a few classes which provide properties or methods which are not isolated among threads or are not thread-safe ('System.setProperties() and Security.addProvider() are OK - any canonical documentation on this matter is welcomed btw).

A potentially workable and fast solution - that at least can give you a chance to test this theory for your library - is to use a servlet engine such as Jetty or Tomcat.

Build a few wars that contain your library and start processes in parallel (1 per war).

When running code inside a servlet thread, the WebappClassLoaders of these engines attempt to load a classes from the parent class loader first (the same as the engine) and if it does not find the class, attempts to load it from the jars/classes packaged with the war.

With jetty you can programmatically hot deploy wars to the context of your choice and then theoretically scale the number of processors (wars) as required.

We have implemented our own class loader by extending URLClassLoader and have taken inspiration from the Jetty Webapp ClassLoader. It is not as hard a job as as it seems.

Our classloader does the exact opposite: it attempts to load a class from the jars local to the 'package' first , then tries to get them from the parent class loader. This guarantees that a library accidentally loaded by the parent classloader is never considered (first). Our 'package' is actually a jar that contains other jars/libraries with a customized manifest file.

Posting this class loader code "as is" would not make a lot of sense (and create a few copyright issues). If you want to explore that route further, I can try coming up with a skeleton.

Source of the Jetty WebappClassLoader

like image 137
Bruno Grieder Avatar answered Oct 04 '22 10:10

Bruno Grieder