Skip navigation links

Package org.commonvox.hbase_column_manager

ColumnManagerAPI for HBase™ provides an extended METADATA REPOSITORY SYSTEM for HBase with options for:

    (1) COLUMN AUDITING/DISCOVERY -- captures Column metadata (qualifier-name and max-length for each unique column-qualifier) -- either via real-time auditing as Tables are updated, or via a discovery facility (direct-scan or mapreduce) for previously-existing Tables; the discovery process also captures column-occurrences count and cell-occurrences count for each unique column-qualifier;
    (2) COLUMN-ALIASING -- involves a 4-byte (positive integer) column-alias being stored in each cell in place of the full-length column-qualifier, potentially conserving considerable data storage space; this works invisibly to the application developer, who continues working only with the standard hbase-client API interfaces, reading and writing full-length column-qualifiers;
    (3) COLUMN-DEFINITION FACILITIES -- administratively-managed ColumnDefinitions (stipulating valid qualifier-name, column length, and/or value) may be created and (a) optionally activated for column validation and enforcement as Tables are updated, and/or (b) used in the generation of various "Invalid Column" CSV-formatted reports (reporting on any column qualifiers, lengths, or values which do not adhere to ColumnDefinitions);
    (4) SCHEMA EXPORT/IMPORT -- provides schema (metadata) export and import facilities for HBase Namespace, Table, and all table-component structures;
    (5) SCHEMA CHANGE MONITORING -- tracks and provides an audit trail for structural modifications made to Namespaces, Tables, and Column Families.

A basic COMMAND-LINE INTERFACE is also provided for direct invocation of a number of the above-listed functions without any need for Java coding.

Once it is installed and configured, standard usage of the ColumnManagerAPI in Java programs is accomplished by simply substituting any reference to the standard HBase ConnectionFactory class with a reference to the ColumnManager MConnectionFactory class (as shown in the USAGE IN APPLICATION DEVELOPMENT section below).
All other interactions with the HBase API are then to be coded as usual; ColumnManager will work behind the scenes to capture and manipulate HBase metadata as stipulated by an administrator/developer in the ColumnManager configuration.

Any application coded with the ColumnManager API can be made to revert to standard HBase API functionality simply by either (a) setting the value of the column_manager.activated property to <false> in all hbase-*.xml configuration files, or (b) removing that property from hbase-*.xml configuration files altogether.
Thus, a ColumnManager-coded application can be used with ColumnManager activated in a development and/or staging environment, but deactivated in production (where ColumnManager's extra overhead might be undesirable).

***Go to TABLE OF CONTENTS***

HBase™ is a trademark of the Apache Software Foundation.

See: Description

Package org.commonvox.hbase_column_manager Description


ColumnManagerAPI for HBase™ provides an extended METADATA REPOSITORY SYSTEM for HBase with options for:

    (1) COLUMN AUDITING/DISCOVERY -- captures Column metadata (qualifier-name and max-length for each unique column-qualifier) -- either via real-time auditing as Tables are updated, or via a discovery facility (direct-scan or mapreduce) for previously-existing Tables; the discovery process also captures column-occurrences count and cell-occurrences count for each unique column-qualifier;
    (2) COLUMN-ALIASING -- involves a 4-byte (positive integer) column-alias being stored in each cell in place of the full-length column-qualifier, potentially conserving considerable data storage space; this works invisibly to the application developer, who continues working only with the standard hbase-client API interfaces, reading and writing full-length column-qualifiers;
    (3) COLUMN-DEFINITION FACILITIES -- administratively-managed ColumnDefinitions (stipulating valid qualifier-name, column length, and/or value) may be created and (a) optionally activated for column validation and enforcement as Tables are updated, and/or (b) used in the generation of various "Invalid Column" CSV-formatted reports (reporting on any column qualifiers, lengths, or values which do not adhere to ColumnDefinitions);
    (4) SCHEMA EXPORT/IMPORT -- provides schema (metadata) export and import facilities for HBase Namespace, Table, and all table-component structures;
    (5) SCHEMA CHANGE MONITORING -- tracks and provides an audit trail for structural modifications made to Namespaces, Tables, and Column Families.

A basic COMMAND-LINE INTERFACE is also provided for direct invocation of a number of the above-listed functions without any need for Java coding.

Once it is installed and configured, standard usage of the ColumnManagerAPI in Java programs is accomplished by simply substituting any reference to the standard HBase ConnectionFactory class with a reference to the ColumnManager MConnectionFactory class (as shown in the USAGE IN APPLICATION DEVELOPMENT section below).
All other interactions with the HBase API are then to be coded as usual; ColumnManager will work behind the scenes to capture and manipulate HBase metadata as stipulated by an administrator/developer in the ColumnManager configuration.

Any application coded with the ColumnManager API can be made to revert to standard HBase API functionality simply by either (a) setting the value of the column_manager.activated property to <false> in all hbase-*.xml configuration files, or (b) removing that property from hbase-*.xml configuration files altogether.
Thus, a ColumnManager-coded application can be used with ColumnManager activated in a development and/or staging environment, but deactivated in production (where ColumnManager's extra overhead might be undesirable).

***Go to TABLE OF CONTENTS***

HBase™ is a trademark of the Apache Software Foundation.


FUTURE ENHANCEMENTS MAY INCLUDE:
This package transparently complements the standard HBase API provided by the Apache Software Foundation in the packages org.apache.hadoop.hbase and org.apache.hadoop.hbase.client.



Table of Contents

  1. PREREQUISITES
  2. INSTALLATION
  3. UNINSTALLATION
  4. BASIC CONFIGURATION (INCLUDE/EXCLUDE TABLES FOR ColumnManager PROCESSING)
  5. USAGE IN APPLICATION DEVELOPMENT
  6. COLUMN AUDITING IN REAL-TIME
  7. COLUMN ALIASING
  8. COLUMN DEFINITION FACILITY
  9. QUERYING THE REPOSITORY
  10. ADMINISTRATIVE TOOLS
  11. COMMAND-LINE INVOCATION

I. PREREQUISITES
HBase 1.x or later -- HBase must be installed as per the installation instructions given in the official Apache HBase Reference Guide (either in stand-alone, pseudo-distributed, or fully-distributed mode).

HBase Java API -- An IDE environment must be set up for HBase-oriented development using the HBase Java API such that, at a minimum, an HBase "Hello World" application can be successfully compiled and run in it.

JDK 7 -- HBase 1.x or later (upon which this package is dependent) requires JDK 7 or later.

II. INSTALLATION
Step 1: Get the required JAR files, either (a) via direct download or (b) by setting Maven project dependencies
One of the most recently released versions of the JAR files for ColumnManager may be selected and downloaded from GitHub and included in the IDE environment's compile and run-time classpath configurations.

In the context of a Maven project, a dependency may be set in the project's pom.xml file as in the following example. (Note that this example assumes a current installation of HBase v1.2.1.):

    Remove the following hbase-client dependency element:
      <dependency>
          <groupId>org.apache.hbase</groupId>
          <artifactId>hbase-client</artifactId>
          <version>1.2.1</version>
      </dependency>
    ... and replace it with the following hbase-column-manager dependency element:
      <dependency>
          <groupId>org.commonvox</groupId>
          <artifactId>hbase-column-manager</artifactId>
          <version>1.2.1-beta-02</version>
      </dependency>
To access all currently-available versions of hbase-column-manager, please consult the Maven Central Repository.

Step 2: Activate ColumnManager
Add the following property element to either the <hbase-site.xml> file or an optional separate configuration file named <hbase-column-manager.xml>.
      <property>
         <name>column_manager.activated</name>
         <value>true</value>
      </property>
NOTE that the default for "column_manager.activated" is "false", so when the property above is not present in <hbase-site.xml> or in <hbase-column-manager.xml>, the ColumnManager API will function exactly like the standard HBase API. Thus, a single body of code can operate with ColumnManager functionality in one environment (typically, a development or testing environment) and can completely bypass ColumnManager functionality in another environment (potentially staging or production), with the only difference between the environments being the presence or absence of the "column_manager.activated" property in each environment's hbase-*.xml configuration files.

Step 3: Confirm installation (and create ColumnManager Repository Namespace and Table)
A successful invocation of the method MConnectionFactory.createConnection() will confirm proper installation of ColumnManager, as in the following code example:
      import org.apache.hadoop.hbase.client.Connection;
      import org.commonvox.hbase_column_manager.MConnectionFactory;

      public class ConfirmColumnManagerInstall {
          public static void main(String[] args) throws Exception {
              try (Connection connection = MConnectionFactory.createConnection()) {}
          }
      } 
Note that the first invocation of MConnectionFactory.createConnection() (as in the above code) will result in the automatic creation of the ColumnManager Repository Namespace ("__column_manager_repository_namespace") and Table ("column_manager_repository_table").
If the code above runs successfully, its log output will include a number of lines of Zookeeper INFO output, as well as several lines of ColumnManager INFO output.

Step 4: [OPTIONAL] Explicitly create Repository structures
As an alternative to the automatic creation of the ColumnManager Repository Namespace and Table in the preceding step, these structures may be explicitly created through invocation of the static method RepositoryAdmin#installRepositoryStructures. Successful creation of these structures will result in messages such as the following appearing in the session's log output:
      2015-10-09 11:03:30,184 INFO  [main] commonvox.hbase_column_manager: ColumnManager Repository Namespace has been created ...
      2015-10-09 11:03:31,498 INFO  [main] commonvox.hbase_column_manager: ColumnManager Repository Table has been created ...

III. UNINSTALLATION
Step 1: Deactivate ColumnManager
Either remove the column_manager.activated property element from the environment's hbase-*.xml configuration files or else set the property element's value to false.

Step 2: Invoke the uninstall method
Invoke the static RepositoryAdmin method uninstallRepositoryStructures to disable and delete the Repository table and to drop the Repository namespace.

IV. BASIC CONFIGURATION
INCLUDE/EXCLUDE TABLES FOR ColumnManager PROCESSING
By default, when ColumnManager is installed and activated, all user Tables are included in ColumnManager processing. However, the following options are available to limit ColumnManager processing to a specific subset of user Tables.
Option 1: Explicitly INCLUDE Tables for ColumnManager processing
Specific Tables may optionally be explicitly included in ColumnManager processing (with all others not specified being automatically excluded). This is done by adding the [column_manager.includedTables] property to either the <hbase-site.xml> file or in an optional, separate <hbase-column-manager.xml> file. Values are expressed as fully-qualified Table names (for those Tables not in the default namespace, the fully-qualified name is the Namespace name followed by the Table name, delimited by a colon). Multiple values are delimited by commas, as in the following example:
      <property>
         <name>column_manager.includedTables</name>
         <value>default:*,goodNamespace:myTable,betterNamespace:yetAnotherTable</value>
      </property>
Note that all Tables in a given Namespace may be included by using an asterisk [*] symbol in the place of a specific Table qualifier, as in the example above which includes all Tables in the "default" namespace via the specification, [default:*].

Option 2: Explicitly EXCLUDE Tables from ColumnManager processing
Alternatively, specific Tables may optionally be explicitly excluded from ColumnManager processing (with all others not specified being automatically included). This is done by adding the [column_manager.excludedTables] property to either the <hbase-site.xml> file or in an optional, separate <hbase-column-manager.xml> file. Values are expressed as fully-qualified Table names (for those Tables not in the default namespace, the fully-qualified name is the Namespace name followed by the Table name, delimited by a colon). Multiple values are delimited by commas, as in the following example:
      <property>
         <name>column_manager.excludedTables</name>
         <value>myNamespace:*,goodNamespace:myExcludedTable,betterNamespace:yetAnotherExcludedTable</value>
      </property>
Note that all Tables in a given Namespace may be excluded by using an asterisk [*] symbol in the place of a specific Table qualifier, as in the example above which excludes all Tables in the "myNamespace" namespace via the specification, [myNamespace:*].

Note also that if a [column_manager.includedTables] property is found in the <hbase-*.xml> files, then any [column_manager.excludedTables] property will be ignored.

V. USAGE IN APPLICATION DEVELOPMENT
Two brief Gist examples are available on GitHub which illustrate basic usage of ColumnManager:
(1)
a Gist demonstrating usage of reporting/auditing functions, and (2) a Gist demonstrating usage and optional enforcement of ColumnDefinitions.
A. ALWAYS USE MConnectionFactory INSTEAD OF ConnectionFactory
To use ColumnManager in an HBase development environment, simply replace any reference to the standard HBase API ConnectionFactory with a reference to ColumnManager's MConnectionFactory as follows:

Instead of
      import org.apache.hadoop.hbase.client.ConnectionFactory;
      ...
      Connection myConnection = ConnectionFactory.createConnection();
      ...
Use
      import org.commonvox.hbase_column_manager.MConnectionFactory;
      ...
      Connection myConnection = MConnectionFactory.createConnection();
      ...
Note that all Connection objects created in this manner generate special Admin, Table, and BufferedMutator objects which (in addition to providing all standard HBase API functionality) transparently interface with the ColumnManager Repository for tracking and persisting of Namespace, Table, Column Family, and ColumnAuditor metadata. In addition, ColumnManager-enabled HTableMultiplexer instances may be obtained via the method RepositoryAdmin#createHTableMultiplexer.
B. OPTIONALLY CATCH ColumnManagerIOException OCCURRENCES
In the context of some applications it may be necessary to perform special processing when a ColumnManagerIOException is thrown, which may signify rejection of a specific Column entry submitted in a put (i.e., insert/update) to an enforcement-enabled Table/Column-Family. In such cases, exceptions of this abstract type (or its concrete subclasses) may be caught, and appropriate processing performed.

VI. COLUMN AUDITING IN REAL-TIME
When ColumnManager is activated and usage has been properly configured, ColumnAuditor metadata is gathered and persisted in the Repository at runtime when Mutations (i.e. puts, appends, increments) are submitted via the API to any ColumnManager-included Table. All such metadata is then retrievable via the RepositoryAdmin#getColumnAuditors and RepositoryAdmin#getColumnQualifiers methods.

Note that ColumnAuditor metadata may also be gathered for previously-existing Columns via the RepositoryAdmin discovery methods.

VII. COLUMN ALIASING

Column-alias processing involves a 4-byte (positive integer) column-alias being stored in each cell in place of the full-length Column Qualifier, potentially conserving considerable data storage space; this works invisibly to the application developer, who continues working only with the standard hbase-client API interfaces, reading and writing full-length column-qualifiers.

Enable column aliasing: The administrator may enable column-aliasing for a specified Column Family via the RepositoryAdmin#enableColumnAliases method. Aliasing should only be activated for a newly-defined, completely empty (or freshly truncated) Column Family, and it should not be deactivated after data has been stored in the Column Family.

VIII. COLUMN DEFINITION FACILITY: manage ColumnDefinitions and enable enforcement

A Column Definition pertains to a specific Column Qualifier within a Column Family of a ColumnManager-included Table, and permits optional stipulation of
  • Column Length: valid maximum length of a value stored in HBase for the column, and/or
  • Column Validation Regular Expression: a regular expression that any value submitted for storage in the column must match.


Manage ColumnDefinitions: The ColumnDefinitions of a Column Family are managed via a number of RepositoryAdmin add, get, and delete methods.

Enable enforcement of ColumnDefinitions: Enforcement of the ColumnDefinitions of a given Column Family does not occur until explicitly enabled via the method RepositoryAdmin#enableColumnDefinitionEnforcement. This same method may be invoked to toggle enforcement off again for the Column Family.

When enforcement is enabled, then (a) any Column Qualifier submitted in a put (i.e., insert/update) to the Table:Column-Family must correspond to an existing ColumnDefinition of the Column Family, and (b) the corresponding Column value submitted must pass all validations (if any) stipulated by the ColumnDefinition. Any ColumnDefinition-related enforcement-violation encountered during processing of a put transaction will result in a ColumnManagerIOException (a subclass of the standard IOException class) being thrown: specifically, either a ColumnDefinitionNotFoundException or a ColumnValueInvalidException.

IX. QUERYING THE ColumnManager REPOSITORY
X. ADMINISTRATIVE TOOLS
XI. COMMAND-LINE INVOCATION
The UtilityRunner facility is provided for direct command-line invocation of a subset of administrative functions. It allows invocation of these functions without the need to perform installation or configuration of the full package. The following administrative functions are available via UtilityRunner: TO USE -- DOWNLOAD THE ColumnManager JAR AND INVOKE DESIRED FUNCTIONS:
  • The JAR file corresponding to your currently-installed version of HBase must be downloaded from Github or from the Maven Central Repository. (For example, hbase-column-manager-1.0.3-beta-02.jar would be used with an HBase 1.0.3 installation.)
  • Command-line invocation of UtilityRunner functions may then be performed from within the directory containing the ColumnManager JAR file, as outlined in the following usage instructions, which are outputted by the UtilityRunner's help function:
       ====================
       usage: java [-options] -cp <hbase-classpath-entries>
              org.commonvox.hbase_column_manager.UtilityRunner -u <arg> -t <arg>
              -f <arg> [-h]

           *** Note that <hbase-classpath-entries> must include
           *** $HBASE_HOME/lib/*:$HBASE_HOME/conf, where $HBASE_HOME
           *** is the path to the local HBase installation.

       Arguments for ColumnManagerAPI UtilityRunner:
       ====================
        -u,--utility <arg>   Utility to run. Valid <arg> values are as follows:
                             exportSchema, getChangeEventsForTable,
                             getColumnQualifiers,
                             getColumnQualifiersViaMapReduce, importSchema,
                             uninstallRepository
        -t,--table <arg>     Fully-qualified table name; or submit '*' in place
                             of table qualifier (e.g., 'myNamespace:*') to
                             process all tables in a given namespace.
        -f,--file <arg>      Source/target file.
        -h,--help            Display this help message.
       ====================

       FOR EXAMPLE, the exportSchema function might be invoked as follows from
       within the directory containing the ColumnManager JAR file:

           java -cp *:$HBASE_HOME/lib/*:$HBASE_HOME/conf
               org.commonvox.hbase_column_manager.UtilityRunner
               -u exportSchema -t myNamespace:myTable -f myOutputFile.xml 
Skip navigation links

Copyright © 2016. All rights reserved.