Migrate from Big Data Cloud Compute Edition

Find out how to migrate from Oracle Big Data Cloud Compute Edition (BDCE or BDC) to Big Data Service

Migrating is done in several steps. You can migrate your artifacts to OCI Big Data Service from BDC on Oracle Cloud Infrastructure Classic or BDC on Oracle Cloud Infrastructure.At a high level, you do the following:

  • Export your existing cloud resources from BDC to Object Storage.

  • Import the exported cloud resources from Object Storage to Big Data Service

Prerequisites

Before you begin, ensure the following:
  • You are a valid user to a compartment in Big Data Service
  • You are enabled to do the following:
    • Access the OCI console using your credentials
    • Create a bucket in Oracle Object Storage so that you can copy the HDFS data. For information about Oracle Object Storage, see Overview of Object Storage.
    • Inspect the OCI Object Store configuration

    For more information, see Getting Started with Big Data Service.

  • You have the following OCI parameter values with you:
    Value Details
    Tenancy ID The OCID of the tenancy. For example, ocid1.tenancy.oc1..aaaaaaaa5syd62crbj5xpfajpmopoqasxy7jwxk6ihopm5vk6bxkncyp56kc. For more information, see Where to Get the Tenancy's OCID and User's OCID
    User ID The OCID of the user. For example, ocid1.user.oc1..aaaaaaaa3pnl7qz4c2x2mpq4v4g2mp3wktxoyahwysmjrapgzjoyd3edxltp. For more information, see Where to Get the Tenancy's OCID and User's OCID
    API signing key Required for an application user. For example, 03:8c:ef:51:c8:fe:6b:22:0c:5d:3c:43:a8:ff:58:d9. For information about generating and uploading the API signing key, see the following topics:
    Passphrase for the signing key (Optional) Required if you have generated the key pair with a passphrase.
    Fingerprint for the signing key The fingerprint and passphrase of the signing key are created while generating and uploading the API signing key. For more information, see How to Get the Key's Fingerprint.
    Bucket and tenancy name For example, oci://myBucket@myTenancy/

    For information about buckets, see Putting Data into Object Storage.

    OCI Cloud Storage URL The host name. For example, https://objectstorage.us-phoenix-1.oraclecloud.com.

    For more information, see Create a Cluster.

Exporting Resources

The resources that you can export from Big Data Cloud Compute Edition (BDC) is as follows:
Artifact in BDC Exported Artifacts Artifacts in OCI Big Data Service (BDS)
Data in HDFS

Copied into OCI Object Store at oci://<bucket>@<tenancy>/<exportedHdfsDir>

For example: oci://myStorageBucket@myTenancy/exportedHdfsDir

Copy the exported data from the OCI Object Store to target BDS HDFS directories.

Data in OCI-Classic Object Store

Note: This artifact doesn't apply to Oracle Big Data Cloud on Oracle Cloud Infrastructure.

Copied into OCI Object Store at oci://<bucket>@<tenancy>/<exportedObjDir>

For example: oci://myStorageBucket@myTenancy/exportedObjDir

Hive Metadata Generate the Hive DDL statements on the BDC cluster. Copy the Hive DDL statements from the BDC cluster into the BDS cluster, and execute them.
Zeppelin Notebooks Export the Zeppelin notebook definitions as a .tar.gz file from /user/zeppelin/notebook in HDFS. This is done using a script provided by Oracle. Currently, importing Zeppelin Notebooks is not supported in BDS.
HDFS, YARN, Spark Configuration Files Export the configuration files as a .tar.gz file using a utility script provided by Oracle. As BDS has optimized configuration settings for HDFS, YARN, and Spark, you need not import the configuration files and versions from BDC.
Versions of various Open Source components Export the service version details using Ambari REST API. Customers can also get version details from Ambari (Admin -> Stack and Versions).

Migrating Resources Using WANdisco LiveData Migrator

Ensure that Port 8020 opens at the destination.

For information about WANdisco LiveData Migrator, click here.

To migrate resources using WANdisco LiveData Migrator, follow these steps:

  1. Install LiveData migrator on any edge of the source cluster by running the following commands:
    wget https://wandisco.com/downloads/livedata-migrator.sh
     
    chmod +x livedata-migrator.sh && ./livedata-migrator.sh
     
    service livedata-migrator status
    service hivemigrator status
    service livedata-ui status
  2. After the installation and setup of the LiveData migrator is complete, access the UI and create your user account. The URL of the UI is as follows:
    http://<LDM-Installation-Host.com>:8081
  3. Do the following to migrate data:
    1. Configure source filesystem.
      To add a source filesystem, on your LiveData Migrator dashboard, do the following:
      1. From the Products panel, select the relevant instance.
      2. In the Filesystem Configuration page, click Add source filesystem.
    2. Configure target filesystem.
      To add a target filesystem, on your LiveData Migrator dashboard, do the following:
      1. From the Products panel, select the relevant instance.
      2. In the Filesystem Configuration page, click Add target filesystem.
      3. Select Apache Hadoop for Target as BDS cluster and provide the default filesystem path. Make sure that source and target connect to destination on 8020 port.
    3. Create a path mapping.
      Path mapping enables migrated data to be stored at an equivalent default location on the target. To create path mappings using the UI, follow these steps:
      1. From the Products list on the dashboard, select the LiveData Migrator instance for which you want to create a path mapping.
      2. From the Migrations menu, select Path Mappings.
      3. At the top right of the interface, click the Add New Path button.
    4. Create a migration.
      Migrations transfer existing data from the defined source to a target. To create a new migration from the UI, follow these steps:
      1. Provide a name for the migration.
      2. From your filesystems, select a source and target.
      3. Select the Path on your source filesystem that you want to migrate. Use the folder browser and select the path name you want to migrate. Select the grey folder next to a path name to view its subdirectories.
  4. Migrate the metadata.
    To migrate the metadata, follow these steps:
    1. Export Hive metadata from the source BDC cluster. For more information, see Exporting Hive Metadata.
    2. Import the metadata to the destination BDS ODH 1.0 cluster. For more information, see Importing Metadata.

Migrating Resources Using the Distcp Tool

You can also migrate data and metadata from Big Data Cloud Compute Edition and import them to the Big Data Service using the Distcp tool. Distcp is an open source tool that can be used to copy large data sets between distributed file systems within and across clusters.

Validating the Migration

After migrating the resources, verify that the same set of hive tables are present in the target cluster as in the source cluster.
  1. Connect to the hive shell.
    hive
  2. Run the following command to list the tables:
    show tables;
  3. Run the following commands to query the table:
    SELECT * FROM airports LIMIT 10;
  4. Run the following command to verify the HDFS and Object Store data.
    hadoop fs -du -s /tmp/hivemigrate
  5. Check the cluster health by submitting all relevant jobs and getting the expected results. Pick a job that you ran in BDC and run it on the BDS cluster.
    Note

    Successful run of a job depends not only on the location of the data but also on the configuration settings such as HADOOP_CLASS_PATH, location of the client jars, and so on.