Apache Apex - is an open source enterprise grade unified stream and batch processing platform. It is used in GE Predix platform for IOT. What are the key differences between these 2 platforms? Questions <ol> <li>From a data science perspective, how is it different from Spark?</li> <li>Does Apache Apex provide functionality like Spark MLlib? If we have to built scalable ML models on Apache apex how to do it & which language to use?</li> <li>Will data scientists have to learn Java to built scalable ML models? Does it have python API like pyspark?</li> <li>Can Apache Apex be integrated with Spark and can we use Spark MLlib on top of Apex to built ML models?</li> </ol>

<ol> <li>Apache Apex an engine for processing streaming data. Some others which try to achieve the same are Apache storm, Apache flink. Differenting factor for Apache Apex is: it comes with built-in support for fault-tolerance, scalability and focus on operability which are key considerations in production use-cases.</li> </ol> Comparing it with Spark: Apache Spark is actually a batch processing. If you consider Spark streaming (which uses spark underneath) then it is micro-batch processing. In contrast, Apache apex is a true stream processing. In a sense that, incoming record does NOT have to wait for next record for processing. Record is processed and sent to next level of processing as soon as it arrives. <ol start="2"> <li>Currently, work is under progress for adding support for integration of Apache Apex with machine learning libraries like Apache Samoa, H2O Refer https://issues.apache.org/jira/browse/SAMOA-49</li> <li>Currently, it has support for Java, Scala. https://www.datatorrent.com/blog/blog-writing-apache-apex-application-in-scala/ For Python, you may try it using Jython. But, I haven't not tried it myself. So, not very sure about it. </li> <li>Integration with Spark may not be good idea considering they are two different processing engines. But, Apache apex integration with Machine learning libraries is under progress. </li> </ol> If you have any other questions, requests for features you can post them on mailing list for apache apex users: https://mail-archives.apache.org/mod_mbox/incubator-apex-users/

What is the differences between Apache Spark and Apache Apex?

1 Answers

Apache Apex an engine for processing streaming data. Some others which try to achieve the same are Apache storm, Apache flink. Differenting factor for Apache Apex is: it comes with built-in support for fault-tolerance, scalability and focus on operability which are key considerations in production use-cases.

Comparing it with Spark: Apache Spark is actually a batch processing. If you consider Spark streaming (which uses spark underneath) then it is micro-batch processing. In contrast, Apache apex is a true stream processing. In a sense that, incoming record does NOT have to wait for next record for processing. Record is processed and sent to next level of processing as soon as it arrives.

Currently, work is under progress for adding support for integration of Apache Apex with machine learning libraries like Apache Samoa, H2O Refer https://issues.apache.org/jira/browse/SAMOA-49
Currently, it has support for Java, Scala.
https://www.datatorrent.com/blog/blog-writing-apache-apex-application-in-scala/ For Python, you may try it using Jython. But, I haven't not tried it myself. So, not very sure about it.
Integration with Spark may not be good idea considering they are two different processing engines. But, Apache apex integration with Machine learning libraries is under progress.

If you have any other questions, requests for features you can post them on mailing list for apache apex users: https://mail-archives.apache.org/mod_mbox/incubator-apex-users/

127

answered Sep 20 '22 14:09

Yogi Devendra

Related questions
                            
                                How can set the default spark logging level?
                            
                                Meaning of Apache Spark warning "Calling spill() on RowBasedKeyValueBatch"
                            
                                Why is dataset.count causing a shuffle! (spark 2.2)
                            
                                Extract information from a `org.apache.spark.sql.Row`
                            
                                What is the right way to save\load models in Spark\PySpark
                            
                                How to run independent transformations in parallel using PySpark?
                            
                                How to limit functions.collect_set in Spark SQL?
                            
                                Airflow SparkSubmitOperator - How to spark-submit in another server
                            
                                Why does Spark RDD partition has 2GB limit for HDFS?
                            
                                How to mount S3 bucket on Kubernetes container/pods?
                            
                                Why spark application fail with "executor.CoarseGrainedExecutorBackend: Driver Disassociated"?
                            
                                spark ssc.textFileStream is not streamining any files from directory
                            
                                What's the difference between spark.eventLog.dir and spark.history.fs.logDirectory?
                            
                                How to convert DataFrame to Dataset in Apache Spark in Java?
                            
                                How to subtract a column of days from a column of dates in Pyspark?
                            
                                Write DataFrame to mysql table using pySpark
                            
                                How to compute cumulative sum using Spark
                            
                                Why does spark-submit fail with "IllegalArgumentException: Missing application resource."?
                            
                                How to start and stop spark Context Manually
                            
                                parallelize() method in SparkContext

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the differences between Apache Spark and Apache Apex?

Tags:

machine-learning

apache-spark

pyspark

stream-processing

apache-apex

GeorgeOfTheRF

People also ask

1 Answers

Yogi Devendra

Recent Activity

Donate For Us