In code sample below, I am trying to get a stream of employee records <code>{ Country, Employer, Name, Salary, Age }</code> and dumping highest paid employee in every country. Unfortunately Multiple KEY By doesn't work. Only KeyBy(Employer) is reflecting, thus I don't get correct result. What am I missing? <pre class="prettyprint"><code>StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<Employee> streamEmployee = env.addSource( new FlinkKafkaConsumer010<ObjectNode>("flink-demo", new JSONDeserializationSchema(), properties)) .map(new MapFunction<ObjectNode, Employee>() { private static final long serialVersionUID = 6111226274068863916L; @Override public Employee map(ObjectNode value) throws Exception { final Gson gson = new GsonBuilder().create(); Employee uMsg = gson.fromJson(value.toString(), Employee.class); return uMsg; } }); KeyedStream<Employee, String> employeesKeyedByCountryndEmployer = streamEmployee .keyBy(new KeySelector<Employee, String>() { private static final long serialVersionUID = -6867736771747690202L; @Override public String getKey(Employee value) throws Exception { // TODO Auto-generated method stub return value.getCountry(); } }).keyBy(new KeySelector<Employee, String>() { private static final long serialVersionUID = -6867736771747690202L; @Override public String getKey(Employee value) throws Exception { // TODO Auto-generated method stub return value.getEmployer(); } }); // This should display employees highly paid in a given country , for a // given employer DataStream<Employee> uHighlyPaidEmployee = employeesKeyedByCountryndEmployer.timeWindow(Time.seconds(5)) .maxBy("salary"); // Assume toString() is overridden , so print works well. uHighlyPaidEmployee.print(); env.execute("Employee-employer log processor"); </code></pre>

You can define a <code>KeySelector</code> that returns a composite key: <pre class="prettyprint"><code>KeyedStream<Employee, Tuple2<String, String>> employeesKeyedByCountryndEmployer = streamEmployee.keyBy( new KeySelector<Employee, Tuple2<String, String>>() { @Override public Tuple2<String, String> getKey(Employee value) throws Exception { return Tuple2.of(value.getCountry(), value.getEmployer()); } } ); </code></pre>

How to support multiple KeyBy in Flink

Tags:

java

apache-kafka

apache-flink

In code sample below, I am trying to get a stream of employee records { Country, Employer, Name, Salary, Age } and dumping highest paid employee in every country. Unfortunately Multiple KEY By doesn't work.

Only KeyBy(Employer) is reflecting, thus I don't get correct result. What am I missing?

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

DataStream<Employee> streamEmployee = env.addSource(
        new FlinkKafkaConsumer010<ObjectNode>("flink-demo", new JSONDeserializationSchema(), properties))
        .map(new MapFunction<ObjectNode, Employee>() {

            private static final long serialVersionUID = 6111226274068863916L;

            @Override
            public Employee map(ObjectNode value) throws Exception {
                final Gson gson = new GsonBuilder().create();
                Employee uMsg = gson.fromJson(value.toString(), Employee.class);
                return uMsg;
            }
        });

KeyedStream<Employee, String> employeesKeyedByCountryndEmployer = streamEmployee
        .keyBy(new KeySelector<Employee, String>() {
            private static final long serialVersionUID = -6867736771747690202L;

            @Override
            public String getKey(Employee value) throws Exception {
                // TODO Auto-generated method stub
                return value.getCountry();
            }
        }).keyBy(new KeySelector<Employee, String>() {
            private static final long serialVersionUID = -6867736771747690202L;

            @Override
            public String getKey(Employee value) throws Exception {
                // TODO Auto-generated method stub
                return value.getEmployer();
            }
        });
// This should display employees highly paid in a given country , for a
// given employer
DataStream<Employee> uHighlyPaidEmployee = employeesKeyedByCountryndEmployer.timeWindow(Time.seconds(5))
        .maxBy("salary");

// Assume toString() is overridden , so print works well.
uHighlyPaidEmployee.print();

env.execute("Employee-employer log processor");

320

asked Sep 19 '17 08:09

Abhijit Pathak

2 Answers

You can define a KeySelector that returns a composite key:

KeyedStream<Employee, Tuple2<String, String>> employeesKeyedByCountryndEmployer = 
  streamEmployee.keyBy(
    new KeySelector<Employee, Tuple2<String, String>>() {

      @Override
      public Tuple2<String, String> getKey(Employee value) throws Exception {
        return Tuple2.of(value.getCountry(), value.getEmployer());
      }
    }
  );

193

answered Oct 17 '22 15:10

Fabian Hueske

If you try to replace the code with lambda expression you will run in to problems described here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/java_lambdas.html

answered Oct 17 '22 15:10

Horatiu

Related questions
                            
                                Java 8 Sum two object properties in one iteration
                            
                                Creating .p12 truststore with openssl
                            
                                Selenium Wait for anyone of Element to visible
                            
                                java.lang.UnsupportedClassVersionError: org/springframework/boot/gradle/plugin/SpringBootPlugin : Unsupported major.minor version 52.0
                            
                                HashMap with 8 million entries becomes slow
                            
                                java 9: JLink created invalid images - missing module executable script
                            
                                Inheriting class with primary constructor
                            
                                Why is the time complexity of this example from "Cracking the Coding Interview" O(k c^k)?
                            
                                Why does casting to float produce correct result in java?
                            
                                Stream over a List of Map and collect specific key
                            
                                How to fix a view in collapsing toolbar while scrolling?
                            
                                Disable warning "Access can be package-private" for @Transactional methods
                            
                                convert android hashmap to kotlin
                            
                                Spring boot 2 @Transactional annotation makes Autowired fields null
                            
                                How to add Jsoup to my Android Studio project?
                            
                                JUnit 5 under Gradle with multiple source sets
                            
                                Restrict Access of method calling in Java
                            
                                Hibernate annotations. @Where vs @WhereJoinTable
                            
                                clear all values of hashmap except two key/value pair
                            
                                Private interface methods are supported

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With