Add Flink 2.0 and Spark 4.1 upgrades to PR #4 by adrianodyn · Pull Request #6 · dynatrace-research/ShuffleBench

adrianodyn · 2026-04-27T13:41:11Z

This builds on top of #4 and adds:

shuffle-flink: upgrade to Flink 2.0 on Java 21
shuffle-spark: upgrade to Spark 4.1 on Java 21
Kubernetes: small fixes needed to run on modern EKS / K8s
(EBS CSI provisioner; Strimzi CRD/JVM-options compat)

- Bump Flink to 2.0.0 (flink-connector-kafka 4.0.1-2.0), Shadow 8.1.1, Jib 3.4.5. - Switch toolchain to Java 21. - Migrate sources to the Flink 2.x DataStream/Sink V2 APIs and the new KafkaRecordSerializationSchema (replaces the deprecated KafkaSerializationSchema). - Update Kryo serializers and SerDes for the Flink 2.x type-info changes. - Kubernetes manifests: * Rename ConfigMap key flink-conf.yaml -> config.yaml (Flink 2.x). * Update volume items in JobManager/TaskManager deployments accordingly. * Add the standard Flink 2.x JVM --add-opens / --add-exports list to env.java.opts.all (the official image sets these in its bundled config.yaml, but our ConfigMap mount overrides /opt/flink/conf).

- Bump Spark to 4.1.1 / Hadoop 3.4.2; switch toolchain and runtime to Java 21. - Rewrite Dockerfile on ubuntu:22.04 + openjdk-21-jre-headless; ship the shadow fat jar (shuffle-spark-1.0-SNAPSHOT-all.jar) and add an IMAGE_BUILD_ID build marker for deploy verification. - Application: refactor SparkStructuredStreamingShuffle to support both MicroProfile Config (USE_MICROPROFILE_CONFIG=true) and a pure env-var fallback. Default trigger mode is the standard Spark micro-batch (spark.trigger.mode=default); 'realtime', 'processing' and 'continuous' modes are configurable via spark.trigger.mode + spark.trigger.interval. - Update microprofile-config.properties defaults and add a short README documenting the new trigger-mode configuration. - Tweak Kryo registrator and spark-master/spark-defaults.conf for the new driver/executor JVM module-access requirements. - Kubernetes manifest: * Use the -all (shadow) jar path. * Pass the standard Spark-on-Java-21 --add-opens / --add-exports list via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions. * Set the env vars now required by the env-var fallback (KAFKA_TOPIC_INPUT/OUTPUT, CONSUMER_OUTPUT_RATE, CONSUMER_STATE_SIZE_BYTES, CONSUMER_INIT_COUNT_SEED).

- aws/kafka-storage-class.yaml: switch from the in-tree kubernetes.io/aws-ebs provisioner (removed in K8s 1.23+) to the AWS EBS CSI driver (ebs.csi.aws.com), with csi.storage.k8s.io/fstype: xfs and allowVolumeExpansion: true. - values.yaml: drop the now-invalid 'jvmOptions: -Xms/-Xmx: null' override for Strimzi Kafka (newer Strimzi CRDs reject null for these string fields). Also override STRIMZI_KUBERNETES_VERSION on the cluster operator so Strimzi 0.38 stops trying to parse the Kubernetes 1.34+ version reply (which it doesn't understand).

SoerenHenning · 2026-04-27T13:48:00Z

+# In-tree "kubernetes.io/aws-ebs" provisioner was removed in K8s 1.23+ and
+# is fully gone by 1.35. Use the AWS EBS CSI driver instead.


Suggested change

# In-tree "kubernetes.io/aws-ebs" provisioner was removed in K8s 1.23+ and

# is fully gone by 1.35. Use the AWS EBS CSI driver instead.

Could be removed.

SoerenHenning · 2026-04-27T13:49:54Z

+    # Note: Theodolite chart used to set "-Xms"/"-Xmx" to null here to let
+    # Strimzi auto-size the JVM heap, but newer Strimzi CRDs reject null
+    # for these fields (they must be strings). Leave jvmOptions unset so
+    # the chart's default string values apply. To override, use e.g.:
+    # jvmOptions:
+    #   "-Xms": "4G"
+    #   "-Xmx": "4G"


Suggested change

# Note: Theodolite chart used to set "-Xms"/"-Xmx" to null here to let

# Strimzi auto-size the JVM heap, but newer Strimzi CRDs reject null

# for these fields (they must be strings). Leave jvmOptions unset so

# the chart's default string values apply. To override, use e.g.:

# jvmOptions:

# "-Xms": "4G"

# "-Xmx": "4G"

Could be removed.

adrianodyn added 3 commits April 27, 2026 15:14

SoerenHenning reviewed Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Flink 2.0 and Spark 4.1 upgrades to PR #4#6

Add Flink 2.0 and Spark 4.1 upgrades to PR #4#6
adrianodyn wants to merge 3 commits intodynatrace-research:upgrade-java-kstreams-hzcastfrom
adrianodyn:upgrade-java-kstreams-hzcast

adrianodyn commented Apr 27, 2026

Uh oh!

SoerenHenning Apr 27, 2026

Uh oh!

SoerenHenning Apr 27, 2026

Uh oh!

SoerenHenning Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# In-tree "kubernetes.io/aws-ebs" provisioner was removed in K8s 1.23+ and
		# is fully gone by 1.35. Use the AWS EBS CSI driver instead.

Conversation

adrianodyn commented Apr 27, 2026

Uh oh!

SoerenHenning Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

SoerenHenning Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

SoerenHenning Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants