Using Apache Spark

Apache Spark is a data processing engine that performs processing tasks for big data workloads.

The Thrift JDBC/ODBC server corresponds to the HiveServer2 in built-in Hive. You can test the JDBC server with the beeline script that comes with either Spark or Hive. To connect to the Spark Thrift Server from any machine in a Big Data Service cluster, use the spark-beeline command.

Group Permission to Download Policies

You can grant users access to download Ranger policies using a user group that allows running SQL queries through a Spark job.

In a Big Data Service HA cluster with the Ranger-Spark plugin enabled, you must have access to download Ranger policies to run any SQL queries using a Spark jobs. To grant permission to download Ranger policies, the user must be included in the policy.download.auth.users and tag.download.auth.users lists. For more information, see Spark Job Might Fail With a 401 Error While Trying to Download the Ranger-Spark Policies.

Instead of specifying many users, you can configure the policy.download.auth.groups parameter with a user group in the Spark-Ranger repository in the Ranger UI. This allows all users in that group to download Ranger policies and this feature is supported from ODH version 2.0.10 or later.

Example:

  1. Access the Ranger UI.
  2. Click Edit on the Spark repository.
  3. Navigate to the Add New Configurations section.
  4. Add or update policy.download.auth.groups with the user group.

    Example:

    policy.download.auth.groups = spark,testgroup

  5. Click Save

Spark-Ranger Plugin Extension

The Spark-Ranger plugin extension can't be overridden at runtime in ODH version 2.0.10 or later.

Note

Fine-grained access control can't be fully enforced in non-Spark Thrift Server use cases through the Spark Ranger plugin. Ranger Admin is expected to grant required file access permissions to data in HDFS through HDFS ranger policies.