Reading data from Azure Blob with Spark

Tags:

I am having issue in reading data from azure blobs via spark streaming

JavaDStream<String> lines = ssc.textFileStream("hdfs://ip:8020/directory");

code like above works for HDFS, but is unable to read file from Azure blob

https://blobstorage.blob.core.windows.net/containerid/folder1/

Above is the path which is shown in azure UI, but this doesnt work, am i missing something, and how can we access it.

I know Eventhub are ideal choice for streaming data, but my current situation demands to use storage rather then queues

250

asked Jun 11 '16 12:06

duck

2 Answers

In order to read data from blob storage, there are two things that need to be done. First, you need to tell Spark which native file system to use in the underlying Hadoop configuration. This means that you also need the Hadoop-Azure JAR to be available on your classpath (note there maybe runtime requirements for more JARs related to the Hadoop family):

JavaSparkContext ct = new JavaSparkContext();
Configuration config = ct.hadoopConfiguration();
config.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem");
config.set("fs.azure.account.key.youraccount.blob.core.windows.net", "yourkey");

Now, call onto the file using the wasb:// prefix (note the [s] is for optional secure connection):

ssc.textFileStream("wasb[s]://<BlobStorageContainerName>@<StorageAccountName>.blob.core.windows.net/<path>");

This goes without saying that you'll need to have proper permissions set from the location making the query to blob storage.

169

answered Sep 30 '22 06:09

Yuval Itzchakov

As supplementary, there is a tutorial about HDFS-compatible Azure Blob storage with Hadoop which is very helpful, please see https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-use-blob-storage.

Meanwhile, there is an offical sample on GitHub for Spark streaming on Azure. Unfortunately, the sample is written for Scala, but I think it's still helpful for you.

answered Sep 30 '22 08:09

Peter Pan

Related questions
                            
                                Could not find or load main class com.android.sdkmanager.Main
                            
                                What are the Maven dependency parameters for the Java Gradle API?
                            
                                how to crop the detected face image in opencv java
                            
                                Correctly implementing PagerAdapter in Android
                            
                                The current connections count keeps increasing in my Elasticache Redis node
                            
                                JPA @Column annotation to create comment/description
                            
                                Accessing a HashMap from a different class
                            
                                Determining whether a particular JDK Method typically has an intrinsic implementation
                            
                                How to get Java to wait for user Input
                            
                                How to load @Configuration classes from separate Jars
                            
                                How to rotate an array?
                            
                                @IntDef annotation and return value from other's code that cannot be annotated or how to temporarily disable annotation from affecting the code?
                            
                                JSONObject get value of first node regardless of name
                            
                                Java Eclipse Paho Implementation - Auto reconnect
                            
                                Find the max difference pair in the array
                            
                                Spring Boot Actuator metrics mem and mem.free
                            
                                How to change font size for JMeter?
                            
                                Stream JSON output in Spring MVC
                            
                                How to delete multiple rows with JdbcTemplate
                            
                                Spring Security - Concurrent request during logout

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reading data from Azure Blob with Spark

Tags:

java

apache-spark

azure

azure-blob-storage

spark-streaming

duck

People also ask

2 Answers

Yuval Itzchakov

Peter Pan

Recent Activity

Donate For Us