I want to issue a query of a keyword or hashtag and retrieve all the images from all the tweets that contain the keyword. I can use Twitter4J with Java to easily issue a query and retrieve the resulting tweets. I know that the http://t.co/xxxx
links I can visit in my browser and see the associated image. That image is at https://pbs.twimg.com/xxxxx
. So seems like all I have to do is that process in my code!
I can parse the http://t.co/xxxx
link in each tweet easily enough. However, when I retrieve all the html from that link, I don't see any https://pbs.twimg.com/xxxx
images :(. I think what's happening is twitter is loading those images through JavaScript.
Is there any way I can easily retrieve the images on each tweet??
This is what I have so far:
package com.company;
import twitter4j.*;
import twitter4j.conf.ConfigurationBuilder;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) throws Exception {
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setDebugEnabled(true)
.setOAuthConsumerKey("xxxxxxxxxx")
.setOAuthConsumerSecret("xxxxxxxxxxxx")
.setOAuthAccessToken("xxxxxxxxx-xxx-xxxxxxxx")
.setOAuthAccessTokenSecret("xxxxxxxxxxxxxxxxxxx");
TwitterFactory tf = new TwitterFactory(cb.build());
Twitter twitter = tf.getInstance();
Query query = new Query("#hashtag");
QueryResult result = twitter.search(query);
Pattern pattern = Pattern.compile("http://t.co/\\w{10}");
Pattern imagePattern = Pattern.compile("https\\:\\/\\/pbs\\.twimg\\.com/media/\\w+\\.(png | jpg | gif)(:large)?");
for (Status status : result.getTweets()) {
if (status.isRetweet())
continue;
System.out.println("@" + status.getUser().getScreenName() + ":" + status.getText());
Matcher matcher = pattern.matcher(status.getText());
if (matcher.find()) {
System.out.println("found a t.co url");
URL oracle = new URL(matcher.group());
BufferedReader in = new BufferedReader(
new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
matcher = imagePattern.matcher(inputLine);
if (matcher.find())
System.out.println("YAYAAYAYAYYAYAYAYAYAYAYAYAYAAYAYYAYAAYYAYAYAYA: " + matcher.group());
}
in.close();
}
}
}
}
There is a simpler way to retrieve images in tweets.
If a tweet has an image inserted you can use getMediaEntities()
to get the data of the media, and then retrieve the url with getMediaURL()
You should do something like this
MediaEntity[] media = status.getMediaEntities(); //get the media entities from the status
for(MediaEntity m : media){ //search trough your entities
System.out.println(m.getMediaURL()); //get your url!
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With