Apache flume twitter agent not streaming data

Tags:

I am trying to stream twitter feeds to hdfs and then use hive. But the first part, streaming data and loading to hdfs is not working and giving Null Pointer Exception.

This is what I have tried.

1. Downloaded apache-flume-1.4.0-bin.tar. Extracted it. Copied all the contents to /usr/lib/flume/. in /usr/lib/ i changed owner to the user for flume directory. When I do ls command in /usr/lib/flume/, it shows

bin  CHANGELOG  conf  DEVNOTES  docs  lib  LICENSE  logs  NOTICE  README  RELEASE-NOTES  tools

2. Moved to conf/ directory. I copied file flume-env.sh.template as flume-env.sh And I edited the JAVA_HOME to my java path /usr/lib/jvm/java-7-oracle.

3. Next I created a file called flume.conf in same conf directory and added following contents

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <Twitter Application API key>
TwitterAgent.sources.Twitter.consumerSecret = <Twitter Application API secret>
TwitterAgent.sources.Twitter.accessToken = <Twitter Application Access token>
TwitterAgent.sources.Twitter.accessTokenSecret = <Twitter Application Access token secret>
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, couldera, data science, data scientist, business intelligence, mapreduce, datawarehouse, data ware housing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 600

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

I created an app in twitter. Generated token and added all the keys to above file. API Key I added as consumer key.

I downloaded the flume-sources jar from cloudera -files as they mentioned in here.

4. I added the flume-sources-1.0-SNAPSHOT.jar to /user/lib/flume/lib.

5. Started Hadoop and done the following

hadoop fs -mkdir /user/flume/tweets
hadoop fs -chown -R flume:flume /user/flume
hadoop fs -chmod -R 770 /user/flume

6. I run the following in /user/lib/flume

/usr/lib/flume/conf$ bin/flume-ng agent -n TwitterAgent -c conf -f conf/flume-conf

It is showing JARs it is showing and then exiting.

When I checked the hdfs, there is no files in that. hadoop fs -ls /user/flume/tweets and it is showing nothing.

In hadoop, the core-site.xml file has following configuration

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:8020</value>
        <fina1>true</fina1>
    </property>
</configuration>

Thanks

506

asked May 03 '14 07:05

iUser

1 Answers

I run the following command and it got worked

bin/flume-ng agent –conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n TwitterAgent

123

answered Sep 19 '22 00:09

iUser

Related questions
                            
                                Log all request and response data in tomcat
                            
                                Failed to process phase STRUCTURE of deployment
                            
                                JBoss 7 appends JSESSIONID to URL despite tracking-mode cookie
                            
                                Duplicate classes when using Maven AspectJ weave dependencies
                            
                                How to organize step definitions when using cucumber-java?
                            
                                How to get latitude longitude from google map URL
                            
                                Light and textures in Java3D
                            
                                Read files from raw disk image (2352 bytes/sector)
                            
                                How to set up libdgx with IntelliJ?
                            
                                Best Practice to distribute a Eclipse setup for a project?
                            
                                Understanding CDI/Weld in multi-module application
                            
                                Weird behavior when downloading html using HttpURLConnection
                            
                                How to define a shortcut for special keys in Java Swing, e.g. german umlaut key Ä?
                            
                                How can I encrypt password in log4j.properties?
                            
                                Exisiting Tree Library in Java? [duplicate]
                            
                                SecureRandom.getInstance("SHA1PRNG", "SUN") always blocking while new SecureRandom() is not?
                            
                                Spring MVC: @RequestBody when no content-type is specified
                            
                                How to make cascade version set in maven?
                            
                                Compilation error with generics and ternary operator in JDK 7
                            
                                Change background color of selected item in navigation drawer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apache flume twitter agent not streaming data

Tags:

java

twitter

hadoop

cloudera

flume

iUser

People also ask

1 Answers

iUser

Recent Activity

Donate For Us