分类 Flink 下的文章

游客

标签搜索

大数据
Flink
离线
实时
Redis
OpenJDK
Java
笔记
JVM
Elasticsearch
GC
Hadoop
Hudi
Flink CDC
K8S
数据湖

WD1016

累计撰写 56 篇文章
累计阅读 12.4万 次

搜索到 7 篇与 Flink 的结果

返回首页

2022-12-03
Flink 任务执行流程源码解析用户提交Flink任务时，通过先后调用transform()——>doTransform()——>addOperator()方法，将map、flatMap、filter、process等算子添加到List<Transformation<?>> transformations集合中。在执行execute()方法时，会使用StreamGraphGenerator的generate()方法构建流拓扑StreamGraph（即Pipeline），数据结构属于有向无环图。在StreamGraph中，StreamNode用于记录算子信息，而StreamEdge则用于记录数据交换方式，包括以下几种Partitioner：{dotted startColor="#b3b2b2" endColor="#b3b2b2"/}{dotted startColor="#b3b2b2" endColor="#b3b2b2"/}Partitioner类都是StreamPartitioner类的子类，它们通过实现isPointwise()方法来确定自身的类型。一种是ALL_TO_ALL，另一个种是POINTWISE。/** * A distribution pattern determines, which sub tasks of a producing task are connected to which * consuming sub tasks. * * <p>It affects how {@link ExecutionVertex} and {@link IntermediateResultPartition} are connected * in {@link EdgeManagerBuildUtil} */ public enum DistributionPattern { /** Each producing sub task is connected to each sub task of the consuming task. */ ALL_TO_ALL, /** Each producing sub task is connected to one or more subtask(s) of the consuming task. */ POINTWISE }ALL_TO_ALL意味着上游的每个subtask需要与下游的每个subtask建立连接。{dotted startColor="#b3b2b2" endColor="#b3b2b2"/}{dotted startColor="#b3b2b2" endColor="#b3b2b2"/}POINTWISE则是上游的每个subtask和下游的一个或多个subtask连接。{dotted startColor="#b3b2b2" endColor="#b3b2b2"/}{dotted startColor="#b3b2b2" endColor="#b3b2b2"/}StreamGraph构建完成后，，会通过 PipelineExecutorUtils.getJobGraph()构建JobGraph，具体流程是：——>PipelineExecutorUtils.getJobGraph() ——>FlinkPipelineTranslationUtil.getJobGraph() ——>StreamGraphTranslator.translateToJobGraph() ——>StreamGraph.getJobGraph() ——>StreamingJobGraphGenerator.createJobGraph()JobGraph是优化后的StreamGraph，如果相连的算子支持chaining，合并到一个StreamNode，chaining在StreamingJobGraphGenerator的setChaining()方法中实现：/** * Sets up task chains from the source {@link StreamNode} instances. * * <p>This will recursively create all {@link JobVertex} instances. */ private void setChaining(Map<Integer, byte[]> hashes, List<Map<Integer, byte[]>> legacyHashes) { // we separate out the sources that run as inputs to another operator (chained inputs) // from the sources that needs to run as the main (head) operator. final Map<Integer, OperatorChainInfo> chainEntryPoints = buildChainedInputsAndGetHeadInputs(hashes, legacyHashes); final Collection<OperatorChainInfo> initialEntryPoints = chainEntryPoints.entrySet().stream() .sorted(Comparator.comparing(Map.Entry::getKey)) .map(Map.Entry::getValue) .collect(Collectors.toList()); // iterate over a copy of the values, because this map gets concurrently modified for (OperatorChainInfo info : initialEntryPoints) { createChain( info.getStartNodeId(), 1, // operators start at position 1 because 0 is for chained source inputs info, chainEntryPoints); } }将符合chaining条件的，合并到一个StreamNode条件如下：1. 下游节点输入边只有一个 2. 与下游属于同一个SlotSharingGroup 3. 数据分发策略Forward 4. 流数据交换模式不是批量模式 5. 上下游并行度相等 6. StreamGraph中chaining为true streamGraph 是可以 chain的 7. 算子是否可以链化areOperatorsChainable代码如下：public static boolean isChainable(StreamEdge edge, StreamGraph streamGraph) { StreamNode downStreamVertex = streamGraph.getTargetVertex(edge); return downStreamVertex.getInEdges().size() == 1 && isChainableInput(edge, streamGraph); } private static boolean isChainableInput(StreamEdge edge, StreamGraph streamGraph) { StreamNode upStreamVertex = streamGraph.getSourceVertex(edge); StreamNode downStreamVertex = streamGraph.getTargetVertex(edge); if (!(upStreamVertex.isSameSlotSharingGroup(downStreamVertex) && areOperatorsChainable(upStreamVertex, downStreamVertex, streamGraph) && (edge.getPartitioner() instanceof ForwardPartitioner) && edge.getExchangeMode() != StreamExchangeMode.BATCH && upStreamVertex.getParallelism() == downStreamVertex.getParallelism() && streamGraph.isChainingEnabled())) { return false; } ... ... 从Source节点开始，使用深度优先搜索（DFS）算法递归遍历有向无环图中的所有StreamNode节点。{dotted startColor="#b3b2b2" endColor="#b3b2b2"/}{dotted startColor="#b3b2b2" endColor="#b3b2b2"/}待续 ... ...
- 2022年12月03日
- 1,365 阅读
- 31 点赞
2021-11-20
Flink实时计算问题记录与解决方案当Flink任务的并行度大于Kafka分区数时，可能会导致部分并行度空闲，进而影响水位线（watermark）的生成。为了解决这个问题，可以通过设置withIdleness来进行调整：WatermarkStrategy.<String>forBoundedOutOfOrderness(Duration.ofSeconds(2)).withIdleness(Duration.ofSeconds(60))对于withIdleness参数，应避免将下游任务设置得太小。原因在于，如果上游任务因故障停止，而其恢复所需时间超过了下游任务设置的withIdleness值，那么下游任务会将超时的分区标记为不再消费，导致数据丢失。为避免此问题，建议将分区数设置为不小于任务的并行度，并不设置withIdleness参数，这样可以有效防止潜在的数据丢失情况。{lamp/}kafkaSource指定时间戳消费时，必须为毫秒时间戳，Flink 1.14官网文档为秒，是错误的，指定后不会生效。setStartingOffsets(OffsetsInitializer.timestamp(1654703973000L)){lamp/}要实现Flink与Kafka的端到端一致性，需要确保Kafka的版本不低于2.5。要注意的是，Flink 1.14.2中flink-connector所包含的kafka-clients版本是2.4.X。because of a bug in the Kafka broker (KAFKA-9310). Please upgrade to Kafka 2.5+. If you are running with concurrent checkpoints, you also may want to try without them.Flink-Kafka端到端一致性需要设置TRANSACTIONAL_ID_CONFIG = "transactional.id"，如果不设置，从checkpoint重启会报错：OutOfOrderSequenceException: The broker received an out of order sequence number。{lamp/}Flink CDC同步mysql时，需要把binlog配置成ROW模式，查看命令和配置方法如下：show variables like 'binlog_format%'; vi /etc/my.cnf binlog_format=row systemctl restart mariadb.service非ROW模式时会报以下错误：Caused by: org.apache.flink.table.api.ValidationException: The MySQL server is configured with binlog_format MIXED rather than ROW, which is required for this connector to work properly. Change the MySQL configuration to use a binlog_format=ROW and restart the connector.Flink 1.14.2版本使用CDC 2.2，需要编译CDC源码进行版本适配：1.pom文件中修改flink版本为1.14.2、scala版本为2.12.72.修改flink-table-planner-blink为flink-table-planner；flink-table-runtime-blink为flink-table-runtime3.flink-shaded-guava版本由30.1.1-jre-14.0修改为18.0-13.0修改完成后，会出现部分import报错的情况。需要根据新依赖版本中的路径和类进行相应的修改，例如，将创建TimestampFormat的代码修改为：TimestampFormat timestampOption = JsonFormatOptionsUtil.getTimestampFormat(formatOptions)。在编译过程中，首先需要使用install命令对父module进行安装。这样可以确保本地Maven仓库中包含各个子module的JAR包。对子module进行打包时，如Flink MySQL CDC，可以在子module的POM文件中修改打包方式，将所有依赖项都打包到一个JAR文件中，这样在工程中只需引入一个<dependency>即可。否则，会因为缺少某些依赖报错，如：Could not initialize class io.debezium.connector.mysql.MySqlConnectorConfig
- 2021年11月20日
- 1,754 阅读
- 10 点赞

WD1016

56 文章数

12.4万阅读量

标签云