keyby的分区方式

来源:3-8 指定key之key选择器函数

小禹o0

2021-02-20

老师,请教一个问题,我的程序默认并行度是4,对图中的单词进行keyby之后打印输出,为什么输出结果是hadoop/but/and/is在线程4输出,其他单词在线程1输出呢?这个分配的原理是什么呢?

hadoop
spark
i
but
is
and
package test;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSink;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;

public class KeyByTest {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStreamSource<String> stream = env.readTextFile("/Users/guanyu/IdeaProjects/quickstart/src/main/java/data/wc2.txt");
        DataStreamSink<Tuple2<String, Integer>> tuple2StringKeyedStream = stream.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
                String[] tokens = value.toLowerCase().split(" ");
                for (String token : tokens) {
                    if (token.length() > 0) {
                        out.collect(new Tuple2<>(token, 1));
                    }
                }
            }
        }).keyBy(x -> x.f0).print();
        env.execute("s");
    }
}
1> (i,1)
4> (and,1)
1> (spark,1)
4> (hadoop,1)
4> (but,1)
4> (is,1)
写回答

1回答

Michael_PK

2021-02-21

默认是按照数据的hashcode和分区数进行取模计算的,算出来再哪个就是哪个分区的

0
0

新一代大数据计算引擎 Flink从入门到实战

入行或转型大数据新姿势,多语言系统化讲解,极速入门Flink

969 学习 · 296 问题

查看课程