hbase-java-api-example

This is a simple example usage of HBase on Trusted Analytics Platform.

This application utilizes HBase service broker (from TAP) and HBase Client API to connect to HBase. It performs basic operations, like:

list tables
show table description (column families)
get n last rows from given table
get n first rows from given table
create a table

After being deployed to TAP it provides these functionalities through the following endpoints:

URL	method	operation
/api/tables	GET	list the tables
/api/tables	POST	create new table
/api/tables/{name}	GET	describe details of given table
/api/tables/{name}/head	GET	get first rows of given table
/api/tables/{name}/tail	GET	get last rows of given table
/api/tables/{name}/row	POST	add new value for given row
/api/tables/{name}/row/{rowKey}	GET	get row by given row key

You can use Swagger API to work with the service:

http://hbase-reader.{domain.com}/swagger-ui.html

Under the hood

This is a simple spring boot application. Key point of interest here are:

extracting HBase configuration information (required for connection; provided by hbase-broker and kerberos-broker)
connect to HBase and authenticate in kerberos
using HBase client to perform some admin operations (in our case: getting information on tables)
using HBase client to perform some operations on tables (in our case: reading data)

The following sections will present information on the broker and client API role.

HBase broker

HBase broker of TAP provisions a namespace for the user. After binding to an app, it also provides some configuration information.

{
  "VCAP_SERVICES": {
    "hbase": [
      {
        "credentials": {
          "HADOOP_CONFIG_KEY": { 
            ...
            "hbase.zookeeper.property.clientPort": "2181",
            "hbase.zookeeper.quorum": "cdh-master-0.node.server.com,cdh-master-1.node.server.com,cdh-master-2.node.server.com",
            ...
          },
          "hbase.namespace": "2bd6c4db32236dd4a33d19f8ef76257b4a69ff1b",
          ...
        },
        "label": "hbase",
        "name": "hbase1",
        "plan": "bare",
        "tags": []
      }
   ]
   ...

Essential fragments here are:

name key - service instance name
credential section - crucial configuration information, including:
- zookeeper settings (required to connect to HBase)
- hbase.namespace key - the namespace created for the user

Kerberos broker

In TAP Kerberos credentials can be obtained from kerberos-broker. After creating service instance and binding it to an application, the following information are available:

  "kerberos": [
   {
    "credentials": {
     "enabled": true,
     "kcacert": "...",
     "kdc": "...",
     "kpassword": "...",
     "krealm": "...",
     "kuser": "..."
    },
    "label": "kerberos",
    "name": "kerberos-instance",
    "plan": "shared",
    "tags": [
     "kerberos"
    ]
   }
  ]

Connecting to HBase

TAP platform provides hadoop-utils library. It contains many usefull utils. For example, connecting to HBase boils down to:

    Hbase.newInstance().createConnection().connect();

hadoop-utils takes care of the configuration and authentication (reads data from HBase and Kerberos service binding).

HBase Java API (1.1.2)

HBase project provides Java client API.

If you want to use the API in your Maven project, the corresponding dependency is:

<dependency>
	<groupId>org.apache.hbase</groupId>
	<artifactId>hbase-client</artifactId>
	<version>1.1.2</version>
</dependency>

("org.apache.hbase:hbase-client:1.1.2" for Gradle).

In our case, we depend on hadoop-utils instead which bring all required dependencies:

<dependency>
	<groupId>org.trustedanalytics</groupId>
	<artifactId>hadoop-utils</artifactId>
	<version>0.6.5</version>
</dependency>

("org.trustedanalytics:hadoop-utils:0.6.5" for Gradle)

You'll find javadocs here: https://hbase.apache.org/apidocs/index.html/

The API allows for interaction with HBase for DDL (administrative tasks like tables creation/deletion) and DML (data importing, querying).

This sample application shows some examples of these operations.

Row get

       Result r = null; 
       try (Connection connection = hBaseConnectionFactory.connect()) {
            Table table = connection.getTable(TableName.valueOf(name));
            Get get = new Get(Bytes.toBytes(rowKey));
            r = table.get(get);
        } catch (org.apache.hadoop.hbase.TableNotFoundException e) {
            throw new TableNotFoundException(name);
        } catch (IOException e) {
            LOG.error("Error while talking to HBase.", e);
        }

Table scan

Get first 10 rows of given table (by name):

       List<RowValue> result = new ArrayList<>();
       try (Connection connection = hBaseConnectionFactory.connect()) {
            Table table = connection.getTable(TableName.valueOf(name));

            Scan scan = new Scan();
            scan.setFilter(new PageFilter(10));

            try (ResultScanner rs = table.getScanner(scan)) {
                for (Result r = rs.next(); r != null; r = rs.next()) {
                    //conversionsService.constructRowValue is a helper method (defined in the app)
                    result.add(conversionsService.constructRowValue(r));
                }
            }
        }

Admin API usage

Fetch list of tables:

       List<TableDescription> result = null;
       try (Connection connection = hBaseConnectionFactory.connect();
          Admin admin = connection.getAdmin()) {
          HTableDescriptor[] tables = admin.listTables();

          Stream<HTableDescriptor> tableDescriptorsStream = Arrays.stream(tables);

          //ConversionService.constructTableDecription is a helper method (defined in the app)
          result = tableDescriptorsStream.map(conversionsService::constructTableDescription) 
              .collect(Collectors.toList());
      } catch (IOException e) {
          LOG.error("Error while talking to HBase.", e);
      }

Of course, obtaining the connection for every operation is costly (connect to ZooKeeper, connect to HBase takes time). In real life, you'd probably strive to reuse HBase connections.

Compiling and deploying the example

Manual deployment

App deployment is described in details on the Platform Wiki: Getting started Guilde.

The procedure boils down to following steps. After cloning the repository you will be able to compile the project with:

./gradlew clean check assemble

(optional) to update headers use

./gradlew licenseFormatMain

Before deploying, which can be done with cf push, make sure there is an HBase instance available for you.

Also take notice that after you build the project with gradlew assemble the application manifest file has been auto-generated from the template src/cloudfoundry/manifest.yml and copied into the project root folder.

If it is not already done, create an instance of HBase service:

cf create-service hbase bare hbase1

To use this instance either add it to manifest.yml or bind it to the app through CLI.

If it is not already done, create an instance of Kerberos service:

cf create-service kerberos shared kerberos-instance

You can define the bindings in services section of app's manifest file:

---
applications:
- name: hbase-reader
  memory: 1G
  instances: 1
  host: hbase-reader
  path: build/libs/hbase-rest-0.0.2.jar
  services:
      - hbase1
      - kerberos-instance

Sample manifest is provided in this project for your convenience. Please modify it for your needs (application name, service name, etc.) For example, src/main/resources/application-cloud.properties uses HBase service name for some keys. Adjust properties file accordingly to your needs.

After this you are ready to push your application to the platform:

cf push

If you plan to bind an instance of HBase to applications that is already running, you could do this with following commands:

cf bind-service hbase-reader hbase1

cf bind-service hbase-reader kerberos-instance


cf restage hbase-reader

Automated deployment

Switch to deploy directory: cd deploy
Install tox: sudo -E pip install --upgrade tox
Run: tox
Activate virtualenv with installed dependencies: . .tox/py27/bin/activate
Run deployment script: python deploy.py providing required parameters when running script (python deploy.py -h to check script parameters with their descriptions).

TO DO

update info about namespace and service name in applciation.properties. How namespace is read/used.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
deploy		deploy
gradle/wrapper		gradle/wrapper
license		license
src		src
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
pack.sh		pack.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hbase-java-api-example

Under the hood

HBase broker

Kerberos broker

Connecting to HBase

HBase Java API (1.1.2)

Row get

Table scan

Admin API usage

Compiling and deploying the example

Manual deployment

Automated deployment

TO DO

About

Releases

Packages

Contributors 2

Languages

trustedanalytics/hbase-java-api-example

Folders and files

Latest commit

History

Repository files navigation

hbase-java-api-example

Under the hood

HBase broker

Kerberos broker

Connecting to HBase

HBase Java API (1.1.2)

Row get

Table scan

Admin API usage

Compiling and deploying the example

Manual deployment

Automated deployment

TO DO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages