I am looking for useful documentations or examples for the Apache Arrow API. Can anyone point to some useful resources? I was only able to find some blogs and JAVA documentation (which doesn't say much).
From what I read, it is a standard in-memory columnar database for fast analytics. Is it possible to load the data to arrow memory and to manipulate it ?
You should use arrow as a middle man between two applications which need to communicate using passing objects.
Arrow isn’t a standalone piece of software but rather a component used to accelerate analytics within a particular system and to allow Arrow-enabled systems to exchange data with low overhead.
For example Arrow improves the performance for data movement within a cluster.
See tests for examples.
@Test
public void test() throws Exception {
BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
File testInFile = testFolder.newFile("testIn.arrow");
File testOutFile = testFolder.newFile("testOut.arrow");
writeInput(testInFile, allocator);
String[] args = {"-i", testInFile.getAbsolutePath(), "-o", testOutFile.getAbsolutePath()};
int result = new FileRoundtrip(System.out, System.err).run(args);
assertEquals(0, result);
validateOutput(testOutFile, allocator);
}
Also Apache Parquet uses it. There are conversion examples from/to arrow objects:
MessageType parquet = converter.fromArrow(allTypesArrowSchema).getParquetSchema();
Schema arrow = converter.fromParquet(supportedTypesParquetSchema).getArrowSchema();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With