keyby的分区方式
来源:3-8 指定key之key选择器函数
小禹o0
2021-02-20
老师,请教一个问题,我的程序默认并行度是4,对图中的单词进行keyby之后打印输出,为什么输出结果是hadoop/but/and/is在线程4输出,其他单词在线程1输出呢?这个分配的原理是什么呢?
hadoop
spark
i
but
is
and
package test;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSink;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;
public class KeyByTest {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> stream = env.readTextFile("/Users/guanyu/IdeaProjects/quickstart/src/main/java/data/wc2.txt");
DataStreamSink<Tuple2<String, Integer>> tuple2StringKeyedStream = stream.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
String[] tokens = value.toLowerCase().split(" ");
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2<>(token, 1));
}
}
}
}).keyBy(x -> x.f0).print();
env.execute("s");
}
}
1> (i,1)
4> (and,1)
1> (spark,1)
4> (hadoop,1)
4> (but,1)
4> (is,1)
写回答
1回答
-
Michael_PK
2021-02-21
默认是按照数据的hashcode和分区数进行取模计算的,算出来再哪个就是哪个分区的
00
相似问题