Skip to content

Add Flink 2.0 and Spark 4.1 upgrades to PR #4#6

Draft
adrianodyn wants to merge 3 commits intodynatrace-research:upgrade-java-kstreams-hzcastfrom
adrianodyn:upgrade-java-kstreams-hzcast
Draft

Add Flink 2.0 and Spark 4.1 upgrades to PR #4#6
adrianodyn wants to merge 3 commits intodynatrace-research:upgrade-java-kstreams-hzcastfrom
adrianodyn:upgrade-java-kstreams-hzcast

Conversation

@adrianodyn
Copy link
Copy Markdown

This builds on top of #4 and adds:

  1. shuffle-flink: upgrade to Flink 2.0 on Java 21
  2. shuffle-spark: upgrade to Spark 4.1 on Java 21
  3. Kubernetes: small fixes needed to run on modern EKS / K8s
    (EBS CSI provisioner; Strimzi CRD/JVM-options compat)

- Bump Flink to 2.0.0 (flink-connector-kafka 4.0.1-2.0), Shadow 8.1.1, Jib 3.4.5.
- Switch toolchain to Java 21.
- Migrate sources to the Flink 2.x DataStream/Sink V2 APIs and the new
  KafkaRecordSerializationSchema (replaces the deprecated KafkaSerializationSchema).
- Update Kryo serializers and SerDes for the Flink 2.x type-info changes.
- Kubernetes manifests:
  * Rename ConfigMap key flink-conf.yaml -> config.yaml (Flink 2.x).
  * Update volume items in JobManager/TaskManager deployments accordingly.
  * Add the standard Flink 2.x JVM --add-opens / --add-exports list to
    env.java.opts.all (the official image sets these in its bundled
    config.yaml, but our ConfigMap mount overrides /opt/flink/conf).
- Bump Spark to 4.1.1 / Hadoop 3.4.2; switch toolchain and runtime to Java 21.
- Rewrite Dockerfile on ubuntu:22.04 + openjdk-21-jre-headless; ship the
  shadow fat jar (shuffle-spark-1.0-SNAPSHOT-all.jar) and add an
  IMAGE_BUILD_ID build marker for deploy verification.
- Application: refactor SparkStructuredStreamingShuffle to support both
  MicroProfile Config (USE_MICROPROFILE_CONFIG=true) and a pure env-var
  fallback. Default trigger mode is the standard Spark micro-batch
  (spark.trigger.mode=default); 'realtime', 'processing' and 'continuous'
  modes are configurable via spark.trigger.mode + spark.trigger.interval.
- Update microprofile-config.properties defaults and add a short README
  documenting the new trigger-mode configuration.
- Tweak Kryo registrator and spark-master/spark-defaults.conf for the new
  driver/executor JVM module-access requirements.
- Kubernetes manifest:
  * Use the -all (shadow) jar path.
  * Pass the standard Spark-on-Java-21 --add-opens / --add-exports list
    via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions.
  * Set the env vars now required by the env-var fallback
    (KAFKA_TOPIC_INPUT/OUTPUT, CONSUMER_OUTPUT_RATE,
    CONSUMER_STATE_SIZE_BYTES, CONSUMER_INIT_COUNT_SEED).
- aws/kafka-storage-class.yaml: switch from the in-tree
  kubernetes.io/aws-ebs provisioner (removed in K8s 1.23+) to the AWS EBS
  CSI driver (ebs.csi.aws.com), with csi.storage.k8s.io/fstype: xfs and
  allowVolumeExpansion: true.
- values.yaml: drop the now-invalid 'jvmOptions: -Xms/-Xmx: null' override
  for Strimzi Kafka (newer Strimzi CRDs reject null for these string
  fields). Also override STRIMZI_KUBERNETES_VERSION on the cluster
  operator so Strimzi 0.38 stops trying to parse the Kubernetes 1.34+
  version reply (which it doesn't understand).
Comment on lines +7 to +8
# In-tree "kubernetes.io/aws-ebs" provisioner was removed in K8s 1.23+ and
# is fully gone by 1.35. Use the AWS EBS CSI driver instead.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# In-tree "kubernetes.io/aws-ebs" provisioner was removed in K8s 1.23+ and
# is fully gone by 1.35. Use the AWS EBS CSI driver instead.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be removed.

Comment thread kubernetes/values.yaml
Comment on lines +20 to +26
# Note: Theodolite chart used to set "-Xms"/"-Xmx" to null here to let
# Strimzi auto-size the JVM heap, but newer Strimzi CRDs reject null
# for these fields (they must be strings). Leave jvmOptions unset so
# the chart's default string values apply. To override, use e.g.:
# jvmOptions:
# "-Xms": "4G"
# "-Xmx": "4G"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Note: Theodolite chart used to set "-Xms"/"-Xmx" to null here to let
# Strimzi auto-size the JVM heap, but newer Strimzi CRDs reject null
# for these fields (they must be strings). Leave jvmOptions unset so
# the chart's default string values apply. To override, use e.g.:
# jvmOptions:
# "-Xms": "4G"
# "-Xmx": "4G"

Could be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants