This is a simple example usage of HBase on Trusted Analytics Platform.
This application utilizes HBase service broker (from TAP) and HBase Client API to connect to HBase. It performs basic operations, like:
- list tables
- show table description (column families)
- get n last rows from given table
- get n first rows from given table
- create a table
After being deployed to TAP it provides these functionalities through the following endpoints:
URL | method | operation |
---|---|---|
/api/tables | GET | list the tables |
/api/tables | POST | create new table |
/api/tables/{name} | GET | describe details of given table |
/api/tables/{name}/head | GET | get first rows of given table |
/api/tables/{name}/tail | GET | get last rows of given table |
/api/tables/{name}/row | POST | add new value for given row |
/api/tables/{name}/row/{rowKey} | GET | get row by given row key |
You can use Swagger API to work with the service:
http://hbase-reader.{domain.com}/swagger-ui.html
This is a simple spring boot application. Key point of interest here are:
- extracting HBase configuration information (required for connection; provided by hbase-broker and kerberos-broker)
- connect to HBase and authenticate in kerberos
- using HBase client to perform some admin operations (in our case: getting information on tables)
- using HBase client to perform some operations on tables (in our case: reading data)
The following sections will present information on the broker and client API role.
HBase broker of TAP provisions a namespace for the user. After binding to an app, it also provides some configuration information.
{
"VCAP_SERVICES": {
"hbase": [
{
"credentials": {
"HADOOP_CONFIG_KEY": {
...
"hbase.zookeeper.property.clientPort": "2181",
"hbase.zookeeper.quorum": "cdh-master-0.node.server.com,cdh-master-1.node.server.com,cdh-master-2.node.server.com",
...
},
"hbase.namespace": "2bd6c4db32236dd4a33d19f8ef76257b4a69ff1b",
...
},
"label": "hbase",
"name": "hbase1",
"plan": "bare",
"tags": []
}
]
...
Essential fragments here are:
- name key - service instance name
- credential section - crucial configuration information, including:
- zookeeper settings (required to connect to HBase)
- hbase.namespace key - the namespace created for the user
In TAP Kerberos credentials can be obtained from kerberos-broker. After creating service instance and binding it to an application, the following information are available:
"kerberos": [
{
"credentials": {
"enabled": true,
"kcacert": "...",
"kdc": "...",
"kpassword": "...",
"krealm": "...",
"kuser": "..."
},
"label": "kerberos",
"name": "kerberos-instance",
"plan": "shared",
"tags": [
"kerberos"
]
}
]
TAP platform provides hadoop-utils library. It contains many usefull utils. For example, connecting to HBase boils down to:
Hbase.newInstance().createConnection().connect();
hadoop-utils takes care of the configuration and authentication (reads data from HBase and Kerberos service binding).
HBase project provides Java client API.
If you want to use the API in your Maven project, the corresponding dependency is:
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.1.2</version>
</dependency>
("org.apache.hbase:hbase-client:1.1.2" for Gradle).
In our case, we depend on hadoop-utils instead which bring all required dependencies:
<dependency>
<groupId>org.trustedanalytics</groupId>
<artifactId>hadoop-utils</artifactId>
<version>0.6.5</version>
</dependency>
("org.trustedanalytics:hadoop-utils:0.6.5" for Gradle)
You'll find javadocs here: https://hbase.apache.org/apidocs/index.html/
The API allows for interaction with HBase for DDL (administrative tasks like tables creation/deletion) and DML (data importing, querying).
This sample application shows some examples of these operations.
Result r = null;
try (Connection connection = hBaseConnectionFactory.connect()) {
Table table = connection.getTable(TableName.valueOf(name));
Get get = new Get(Bytes.toBytes(rowKey));
r = table.get(get);
} catch (org.apache.hadoop.hbase.TableNotFoundException e) {
throw new TableNotFoundException(name);
} catch (IOException e) {
LOG.error("Error while talking to HBase.", e);
}
Get first 10 rows of given table (by name):
List<RowValue> result = new ArrayList<>();
try (Connection connection = hBaseConnectionFactory.connect()) {
Table table = connection.getTable(TableName.valueOf(name));
Scan scan = new Scan();
scan.setFilter(new PageFilter(10));
try (ResultScanner rs = table.getScanner(scan)) {
for (Result r = rs.next(); r != null; r = rs.next()) {
//conversionsService.constructRowValue is a helper method (defined in the app)
result.add(conversionsService.constructRowValue(r));
}
}
}
Fetch list of tables:
List<TableDescription> result = null;
try (Connection connection = hBaseConnectionFactory.connect();
Admin admin = connection.getAdmin()) {
HTableDescriptor[] tables = admin.listTables();
Stream<HTableDescriptor> tableDescriptorsStream = Arrays.stream(tables);
//ConversionService.constructTableDecription is a helper method (defined in the app)
result = tableDescriptorsStream.map(conversionsService::constructTableDescription)
.collect(Collectors.toList());
} catch (IOException e) {
LOG.error("Error while talking to HBase.", e);
}
Of course, obtaining the connection for every operation is costly (connect to ZooKeeper, connect to HBase takes time). In real life, you'd probably strive to reuse HBase connections.
App deployment is described in details on the Platform Wiki: Getting started Guilde.
The procedure boils down to following steps. After cloning the repository you will be able to compile the project with:
./gradlew clean check assemble
(optional) to update headers use
./gradlew licenseFormatMain
Before deploying, which can be done with cf push
, make sure there is an HBase instance available for you.
Also take notice that after you build the project with gradlew assemble
the application manifest file has been auto-generated from the template src/cloudfoundry/manifest.yml
and copied into the project root folder.
If it is not already done, create an instance of HBase service:
cf create-service hbase bare hbase1
To use this instance either add it to manifest.yml or bind it to the app through CLI.
If it is not already done, create an instance of Kerberos service:
cf create-service kerberos shared kerberos-instance
You can define the bindings in services section of app's manifest file:
---
applications:
- name: hbase-reader
memory: 1G
instances: 1
host: hbase-reader
path: build/libs/hbase-rest-0.0.2.jar
services:
- hbase1
- kerberos-instance
Sample manifest is provided in this project for your convenience. Please modify it for your needs (application name, service name, etc.) For example, src/main/resources/application-cloud.properties uses HBase service name for some keys. Adjust properties file accordingly to your needs.
After this you are ready to push your application to the platform:
cf push
If you plan to bind an instance of HBase to applications that is already running, you could do this with following commands:
cf bind-service hbase-reader hbase1
cf bind-service hbase-reader kerberos-instance
cf restage hbase-reader
- Switch to
deploy
directory:cd deploy
- Install tox:
sudo -E pip install --upgrade tox
- Run:
tox
- Activate virtualenv with installed dependencies:
. .tox/py27/bin/activate
- Run deployment script:
python deploy.py
providing required parameters when running script (python deploy.py -h
to check script parameters with their descriptions).
- update info about namespace and service name in applciation.properties. How namespace is read/used.