Querying HBase using Spark

For more information and examples, see HBase Example Using HBase Spark Connector.

Provide the Spark user to perform CRUD operation in HBase using "hbase" user:

sudo -u hbase bash
kinit -kt /etc/security/keytabs/hbase.headless.keytab <Spark-user>  
hbase shell
grant 'spark', 'RWXCA'
exit

Sign-in to Ranger.
Select the HBase service.
Add or update policy to give access "create,read,write,execute" to the Spark user.

Sign-in with Spark user account and create a table in HBase:

sudo su spark  
(kinit with spark if required)
hbase shell
hbase(main):001:0> create 'person', 'p', 'c'

Start spark-shell:

spark-shell --jars
      /usr/lib/hbase/hbase-spark.jar,/usr/lib/hbase/hbase-spark-protocol-shaded.jar,/usr/lib/hbase/*
      --files /etc/hbase/conf/hbase-site.xml --conf
      spark.driver.extraClassPath=/etc/hbase/conf

Insert and read data using spark-shell:

Inserting data:

val sql = spark.sqlContext
 
import java.sql.Date
 
case class Person(name: String,
email: String,
birthDate: Date,
height: Float)
 
var personDS = Seq(
Person("alice", "alice@alice.com", Date.valueOf("2000-01-01"), 4.5f),
Person("bob", "bob@bob.com", Date.valueOf("2001-10-17"), 5.1f)
).toDS
 
personDS.write.format("org.apache.hadoop.hbase.spark")
.option("hbase.columns.mapping",
"name STRING :key, email STRING c:email, " +
"birthDate DATE p:birthDate, height FLOAT p:height")
.option("hbase.table", "person")
.option("hbase.spark.use.hbasecontext", false)
.save()

Results:

shell> scan 'person'
ROW       COLUMN+CELL
 alice    column=c:email, timestamp=1568723598292, value=alice@alice.com
 alice    column=p:birthDate, timestamp=1568723598292, value=\x00\x00\x00\xDCl\x87 \x00
 alice    column=p:height, timestamp=1568723598292, value=@\x90\x00\x00
 bob      column=c:email, timestamp=1568723598521, value=bob@bob.com
 bob      column=p:birthDate, timestamp=1568723598521, value=\x00\x00\x00\xE9\x99u\x95\x80
 bob      column=p:height, timestamp=1568723598521, value=@\xA333
2 row(s)

Reading data back:

val sql = spark.sqlContext

val df = sql.read.format("org.apache.hadoop.hbase.spark")
 .option("hbase.columns.mapping",
   "name STRING :key, email STRING c:email, " +
     "birthDate DATE p:birthDate, height FLOAT p:height")
 .option("hbase.table", "person")
 .option("hbase.spark.use.hbasecontext", false)
 .load()
df.createOrReplaceTempView("personView")

val results = sql.sql("SELECT * FROM personView WHERE name = 'alice'")
results.show()

Oracle Cloud Infrastructure Documentation

Querying HBase using Spark