Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get JSON elements from a web with Apache Flink

After reading several documentation pages of Apache Flink (official documentation, dataartisans) as well as the examples provided in the official repository, I keep seeing examples where they use as the data source for streamming a file already downloaded, connecting always to the localhost.

I am trying to use Apache Flink to download JSON files which contain dynamic data. My intention is to try to stablish the url where I can access the JSON file as the input source of Apache Flink, instead of downloading it with another system and processing the downloaded file with Apache Flink.

Is it possible to stablish this net connection with Apache Flink?

like image 626
Alvaro Gomez Avatar asked Feb 28 '16 15:02

Alvaro Gomez


1 Answers

You can define the URLs you want to download as your input DataStream and then download the documents from within a MapFunction. The following code demonstrates this:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

DataStream<String> inputURLs = env.fromElements("http://www.json.org/index.html");

inputURLs.map(new MapFunction<String, String>() {
    @Override
    public String map(String s) throws Exception {
        URL url = new URL(s);
        InputStream is = url.openStream();

        BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(is));

        StringBuilder builder = new StringBuilder();
        String line;

        try {
            while ((line = bufferedReader.readLine()) != null) {
                builder.append(line + "\n");
            }
        } catch (IOException ioe) {
            ioe.printStackTrace();
        }

        try {
            bufferedReader.close();
        } catch (IOException ioe) {
            ioe.printStackTrace();
        }

        return builder.toString();
    }
}).print();

env.execute("URL download job");
like image 73
Till Rohrmann Avatar answered Nov 09 '22 20:11

Till Rohrmann