Migrating Data Flow Applications from Using Spark 2.4 to
Using Spark 3.0.2
Learn about migrating Data Flow to use Spark
3.0.2 rather than Spark 2.4.4.
Data Flow now supports both Spark 2.4.4 and Spark 3.0.2.
This chapter describes what you need to do if you're migrating existing Data Flow applications from Spark 2.4.4 to Spark 3.0.2.
If you change the Spark version for an application that already has a Run,
and then run again this prior Run, it uses the new version of Spark, not the
version it was originally run with.
Errors when Parsing Timestamp or Date Strings 🔗
Learn how to overcome parsing or formatting errors in Data Flow related to date or timestamp strings having
migrated from Spark 2.4.4 to Spark 3.0.2.
Having migrated your Data Flow applications to Spark
3.0.2, you get the following error when running an application:
org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'MMMMM' pattern in the DateTimeFormatter.
To
fix, set spark.sql.legacy.timeParserPolicy to LEGACY.
Rerun the application.