ConnectionFactory
class with a reference to the ColumnManager
MConnectionFactory class (as shown in the
USAGE IN APPLICATION DEVELOPMENT section below).column_manager.activated
property to <false> in all hbase-*.xml configuration files, or (b) removing
that property from hbase-*.xml configuration files altogether.See: Description
| Class | Description |
|---|---|
| ChangeEvent |
A ChangeEvent (obtained via a
ChangeEventMonitor's various
get methods) contains metadata pertaining to a
structural change made to a component of a
ColumnManager-included Namespace, Table,
or Table component; for example, a change to the "durability" setting of a Table
or the "maxVersions" setting of a Column Family
is captured in the ColumnManager repository as a ChangeEvent. |
| ChangeEventMonitor |
A ChangeEventMonitor (obtained via a
RepositoryAdmin's
getChangeEventMonitor method) provides various
get methods by which lists of
ChangeEvents may be obtained (grouped and ordered in various ways), and the class
provides a static convenience method for outputting a list of ChangeEvents to a CSV file. |
| ColumnAuditor |
A ColumnAuditor object (obtained via a
RepositoryAdmin's
getColumnAuditors method) contains captured* or discovered** metadata pertaining to a specific
Column Qualifier actively stored in HBase as part of a Column Family of a
ColumnManager-included Table. |
| ColumnDefinition |
A ColumnDefinition (managed via a number of
RepositoryAdmin
add,
get,
and delete methods) contains administrator-maintained settings pertaining to a specific
Column Qualifier within a Column Family of a
ColumnManager-included Table;when a Column Family has its ColumnDefinitionsEnforced setting enabled, then (a) any Column Qualifier
submitted in a Put to that Column Family
must correspond to an existing ColumnDefinition, and (b) the corresponding
Column value submitted must pass all validations (if any) stipulated by the
ColumnDefinition. |
| MConfiguration |
Provides
Configurations with properties from classpath-accessible
HBase and ColumnManager (hbase-column-manager.xml) configuration files. |
| MConnectionFactory |
The MConnectionFactory provides exactly the same static methods as those provided by the
standard HBase
ConnectionFactory,
but a Connection object
created by the MConnectionFactory generates special
Admin, Table, and BufferedMutator objects which, in addition to providing
all standard HBase API functionality, also: (a) transparently interface with the ColumnManager
repository for tracking of Namespace,
Table, Column Family, and ColumnAuditor metadata, and (b)
optionally enforce administrator-specified ColumnDefinitions when Columns are
submitted in a put (i.e., insert/update) via Table, BufferedMutator, and
HTableMultiplexer interfaces. |
| MHBaseTestingUtility |
The
MHBaseTestingUtility (which is only functional with HBase 2.0 or later)
provides a ColumnManager-enabled extension to the
HBaseTestingUtility. |
| RepositoryAdmin |
A RepositoryAdmin provides ColumnManager Repository maintenance and query facilities,
as well as column-metadata
discovery and full-schema
export/import
facilities; it is used as a complement to the standard
Admin interface to provide for maintenance of optional
ColumnDefinition structures and querying of ColumnAuditor structures. |
| Exception | Description |
|---|---|
| ColumnDefinitionNotFoundException |
Thrown when a Column Family has its
ColumnDefinitionsEnforced setting enabled, and a Column Qualifier submitted in a
Put to that Column Family does NOT
correspond to an existing ColumnDefinition. |
| ColumnManagerIOException |
All
IOExceptions in the ColumnManagerAPI package are subclasses of the abstract
ColumnManagerIOException. |
| ColumnValueInvalidException |
Thrown when a Column Family has its
ColumnDefinitionsEnforced setting enabled, and the Column value submitted in a
Put to the
Column Family does NOT pass a validation stipulated by the
Column's corresponding ColumnDefinition. |
| TableNotIncludedForProcessingException |
Thrown when a submitted Table is not included for
ColumnManager processing, but the context permits only ColumnManager-included Tables.
|
(1) COLUMN AUDITING/DISCOVERY -- captures Column metadata (qualifier-name and max-length for each unique column-qualifier) -- either via real-time auditing as Tables are updated, or via a discovery facility (direct-scan or mapreduce) for previously-existing Tables; the discovery process also captures column-occurrences count and cell-occurrences count for each unique column-qualifier;HBase™ is a trademark of the Apache Software Foundation.
(2) COLUMN-ALIASING -- involves a 4-byte (positive integer) column-alias being stored in each cell in place of the full-length column-qualifier, potentially conserving considerable data storage space; this works invisibly to the application developer, who continues working only with the standard hbase-client API interfaces, reading and writing full-length column-qualifiers;
(3) COLUMN-DEFINITION FACILITIES -- administratively-managed ColumnDefinitions (stipulating valid qualifier-name, column length, and/or value) may be created and (a) optionally activated for column validation and enforcement as Tables are updated, and/or (b) used in the generation of various "Invalid Column" CSV-formatted reports (reporting on any column qualifiers, lengths, or values which do not adhere to ColumnDefinitions);
(4) SCHEMA EXPORT/IMPORT -- provides schema (metadata) export and import facilities for HBase Namespace, Table, and all table-component structures;
(5) SCHEMA CHANGE MONITORING -- tracks and provides an audit trail for structural modifications made to Namespaces, Tables, and Column Families.
A basic COMMAND-LINE INTERFACE is also provided for direct invocation of a number of the above-listed functions without any need for Java coding.
Once it is installed and configured, standard usage of the ColumnManagerAPI in Java programs is accomplished by simply substituting any reference to the standard HBaseConnectionFactoryclass with a reference to the ColumnManager MConnectionFactory class (as shown in the USAGE IN APPLICATION DEVELOPMENT section below).
All other interactions with the HBase API are then to be coded as usual; ColumnManager will work behind the scenes to capture and manipulate HBase metadata as stipulated by an administrator/developer in the ColumnManager configuration.
Any application coded with the ColumnManager API can be made to revert to standard HBase API functionality simply by either (a) setting the value of thecolumn_manager.activatedproperty to<false>in allhbase-*.xmlconfiguration files, or (b) removing that property fromhbase-*.xmlconfiguration files altogether.
Thus, a ColumnManager-coded application can be used with ColumnManager activated in a development and/or staging environment, but deactivated in production (where ColumnManager's extra overhead might be undesirable).
***Go to TABLE OF CONTENTS***
org.apache.hadoop.hbase and
org.apache.hadoop.hbase.client.
HBase 1.x or later -- HBase must be installed as per the installation instructions given in the official Apache HBase Reference Guide (either in stand-alone, pseudo-distributed, or fully-distributed mode).
HBase Java API -- An IDE environment must be set up for HBase-oriented development using the HBase Java API such that, at a minimum, an HBase "Hello World" application can be successfully compiled and run in it.
JDK 7 -- HBase 1.x or later (upon which this package is dependent) requires JDK 7 or later.
Step 1: Get the required JAR files, either (a) via direct download or (b) by setting Maven project dependencies
One of the most recently released versions of the JAR files for ColumnManager may be selected and downloaded from GitHub and included in the IDE environment's compile and run-time classpath configurations.
In the context of a Maven project, a dependency may be set in the project'spom.xmlfile as in the following example. (Note that this example assumes a current installation of HBase v1.2.1.):
Remove the followinghbase-clientdependency element:... and replace it with the following<dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>1.2.1</version> </dependency>hbase-column-managerdependency element:
To access all currently-available versions of hbase-column-manager, please consult the Maven Central Repository.<dependency> <groupId>org.commonvox</groupId> <artifactId>hbase-column-manager</artifactId> <version>1.2.1-beta-02</version> </dependency>
Step 2: Activate ColumnManager
Add the following property element to either the <hbase-site.xml> file or an optional separate configuration file named<hbase-column-manager.xml>.NOTE that the default for "<property> <name>column_manager.activated</name> <value>true</value> </property>column_manager.activated" is "false", so when the property above is not present in<hbase-site.xml>or in<hbase-column-manager.xml>, the ColumnManager API will function exactly like the standard HBase API. Thus, a single body of code can operate with ColumnManager functionality in one environment (typically, a development or testing environment) and can completely bypass ColumnManager functionality in another environment (potentially staging or production), with the only difference between the environments being the presence or absence of the "column_manager.activated" property in each environment'shbase-*.xmlconfiguration files.
Step 3: Confirm installation (and create ColumnManager Repository Namespace and Table)
A successful invocation of the method MConnectionFactory.createConnection() will confirm proper installation of ColumnManager, as in the following code example:Note that the first invocation of MConnectionFactory.createConnection() (as in the above code) will result in the automatic creation of the ColumnManager Repository Namespace ("import org.apache.hadoop.hbase.client.Connection; import org.commonvox.hbase_column_manager.MConnectionFactory; public class ConfirmColumnManagerInstall { public static void main(String[] args) throws Exception { try (Connection connection = MConnectionFactory.createConnection()) {} } }__column_manager_repository_namespace") and Table ("column_manager_repository_table").
If the code above runs successfully, its log output will include a number of lines of Zookeeper INFO output, as well as several lines of ColumnManager INFO output.
Step 4: [OPTIONAL] Explicitly create Repository structures
As an alternative to the automatic creation of the ColumnManager Repository Namespace and Table in the preceding step, these structures may be explicitly created through invocation of the static method RepositoryAdmin#installRepositoryStructures. Successful creation of these structures will result in messages such as the following appearing in the session's log output:2015-10-09 11:03:30,184 INFO [main] commonvox.hbase_column_manager: ColumnManager Repository Namespace has been created ... 2015-10-09 11:03:31,498 INFO [main] commonvox.hbase_column_manager: ColumnManager Repository Table has been created ...
Step 1: Deactivate ColumnManager
Either remove thecolumn_manager.activatedproperty element from the environment'shbase-*.xmlconfiguration files or else set the property element's value tofalse.
Step 2: Invoke theuninstallmethod
Invoke the staticRepositoryAdminmethod uninstallRepositoryStructures to disable and delete the Repository table and to drop the Repository namespace.
INCLUDE/EXCLUDE TABLES FOR ColumnManager PROCESSING
By default, when ColumnManager is installed and activated, all user Tables are included in ColumnManager processing. However, the following options are available to limit ColumnManager processing to a specific subset of user Tables.Option 1: Explicitly INCLUDE Tables for ColumnManager processing
Specific Tables may optionally be explicitly included in ColumnManager processing (with all others not specified being automatically excluded). This is done by adding the[column_manager.includedTables]property to either the <hbase-site.xml> file or in an optional, separate<hbase-column-manager.xml>file. Values are expressed as fully-qualified Table names (for those Tables not in the default namespace, the fully-qualified name is the Namespace name followed by the Table name, delimited by a colon). Multiple values are delimited by commas, as in the following example:Note that all Tables in a given Namespace may be included by using an asterisk<property> <name>column_manager.includedTables</name> <value>default:*,goodNamespace:myTable,betterNamespace:yetAnotherTable</value> </property>[*]symbol in the place of a specific Table qualifier, as in the example above which includes all Tables in the "default" namespace via the specification, [default:*].
Option 2: Explicitly EXCLUDE Tables from ColumnManager processing
Alternatively, specific Tables may optionally be explicitly excluded from ColumnManager processing (with all others not specified being automatically included). This is done by adding the[column_manager.excludedTables]property to either the <hbase-site.xml> file or in an optional, separate<hbase-column-manager.xml>file. Values are expressed as fully-qualified Table names (for those Tables not in the default namespace, the fully-qualified name is the Namespace name followed by the Table name, delimited by a colon). Multiple values are delimited by commas, as in the following example:Note that all Tables in a given Namespace may be excluded by using an asterisk<property> <name>column_manager.excludedTables</name> <value>myNamespace:*,goodNamespace:myExcludedTable,betterNamespace:yetAnotherExcludedTable</value> </property>[*]symbol in the place of a specific Table qualifier, as in the example above which excludes all Tables in the "myNamespace" namespace via the specification, [myNamespace:*].
Note also that if a[column_manager.includedTables]property is found in the<hbase-*.xml>files, then any[column_manager.excludedTables]property will be ignored.
Two brief Gist examples are available on GitHub which illustrate basic usage of ColumnManager:
(1) a Gist demonstrating usage of reporting/auditing functions, and (2) a Gist demonstrating usage and optional enforcement of ColumnDefinitions.
A. ALWAYS USEMConnectionFactoryINSTEAD OFConnectionFactoryTo use ColumnManager in an HBase development environment, simply replace any reference to the standard HBase APIB. OPTIONALLY CATCHConnectionFactorywith a reference to ColumnManager's MConnectionFactory as follows:
Instead of
Useimport org.apache.hadoop.hbase.client.ConnectionFactory; ... Connection myConnection = ConnectionFactory.createConnection(); ...
Note that all Connection objects created in this manner generate specialimport org.commonvox.hbase_column_manager.MConnectionFactory; ... Connection myConnection = MConnectionFactory.createConnection(); ...Admin,Table, andBufferedMutatorobjects which (in addition to providing all standard HBase API functionality) transparently interface with the ColumnManager Repository for tracking and persisting of Namespace, Table, Column Family, and ColumnAuditor metadata. In addition, ColumnManager-enabledHTableMultiplexerinstances may be obtained via the method RepositoryAdmin#createHTableMultiplexer.ColumnManagerIOExceptionOCCURRENCESIn the context of some applications it may be necessary to perform special processing when a ColumnManagerIOException is thrown, which may signify rejection of a specific Column entry submitted in aput(i.e., insert/update) to an enforcement-enabled Table/Column-Family. In such cases, exceptions of this abstract type (or its concrete subclasses) may be caught, and appropriate processing performed.
When ColumnManager is activated and usage has been properly configured, ColumnAuditor metadata is gathered and persisted in the Repository at runtime when Mutations (i.e. puts, appends, increments) are submitted via the API to any ColumnManager-included Table. All such metadata is then retrievable via the RepositoryAdmin#getColumnAuditors and RepositoryAdmin#getColumnQualifiers methods.
Note that ColumnAuditor metadata may also be gathered for previously-existing Columns via the RepositoryAdmin discovery methods.
Column-alias processing involves a 4-byte (positive integer) column-alias being stored in each cell in place of the full-length Column Qualifier, potentially conserving considerable data storage space; this works invisibly to the application developer, who continues working only with the standard hbase-client API interfaces, reading and writing full-length column-qualifiers.
Enable column aliasing: The administrator may enable column-aliasing for a specified Column Family via the RepositoryAdmin#enableColumnAliases method. Aliasing should only be activated for a newly-defined, completely empty (or freshly truncated) Column Family, and it should not be deactivated after data has been stored in the Column Family.
A Column Definition pertains to a specific Column Qualifier within a Column Family of a ColumnManager-included Table, and permits optional stipulation of
- Column Length: valid maximum length of a value stored in HBase for the column, and/or
- Column Validation Regular Expression: a regular expression that any value submitted for storage in the column must match.
Manage ColumnDefinitions: The ColumnDefinitions of a Column Family are managed via a number of RepositoryAdmin add, get, and delete methods.
Enable enforcement of ColumnDefinitions: Enforcement of the ColumnDefinitions of a given Column Family does not occur until explicitly enabled via the method RepositoryAdmin#enableColumnDefinitionEnforcement. This same method may be invoked to toggle enforcementoffagain for the Column Family.
When enforcement is enabled, then (a) any Column Qualifier submitted in aput(i.e., insert/update) to the Table:Column-Family must correspond to an existingColumnDefinitionof the Column Family, and (b) the corresponding Column value submitted must pass all validations (if any) stipulated by theColumnDefinition. AnyColumnDefinition-related enforcement-violation encountered during processing of aputtransaction will result in a ColumnManagerIOException (a subclass of the standardIOExceptionclass) being thrown: specifically, either a ColumnDefinitionNotFoundException or a ColumnValueInvalidException.
Subsequent to either the capture of column metadata in real-time or its discovery via the RepositoryAdmin discovery methods, a list of the Column Qualifiers belonging to a Column Family of a Table may be obtained via the RepositoryAdmin#getColumnQualifiers method. Alternatively, a list of ColumnAuditor objects (containing column qualifiers and additional column metadata) is obtained via the RepositoryAdmin#getColumnAuditors method.
Subsequent to creation of ColumnDefinitions for a Table/ColumnFamily, a CSV-formatted report listing columns which deviate from those ColumnDefinitions (either in terms of qualifier-name, length, or value) may be generated via the various RepositoryAdmin#outputReportOnInvalidColumn* methods. If a method is run in verbose mode, the outputted CSV file will include an entry (identified by the fully-qualified column name and rowId) for each explicit invalid column that is found; otherwise the report will contain a summary, giving a count of the invalidities associated with a specific column-qualifier name. Note that invalid column report processing may optionally be done via direct-scan or via mapreduce.
A ChangeEventMonitor object (obtained via the method RepositoryAdmin#getChangeEventMonitor) outputs lists of ChangeEvents (pertaining to structural changes made to user Namespaces, Tables, Column Families, ColumnAuditors, and ColumnDefinitions) tracked by the ColumnManager Repository.
The ChangeEventMonitor's "get" methods allow for retrievingChangeEvents grouped and ordered in various ways, and a static convenience method, ChangeEventMonitor#exportChangeEventListToCsvFile, is provided for outputting a list ofChangeEvents to a CSV file.
When ColumnManager is installed into an already-populated HBase environment, the RepositoryAdmin#discoverColumnMetadata method may be invoked to perform discovery of column-metadata for all ColumnManager-included Tables. Column metadata (for each unique column-qualifier value found) is persisted in the ColumnManager Repository in the form of ColumnAuditor objects; all such metadata is then retrievable via the RepositoryAdmin#getColumnAuditors and RepositoryAdmin#getColumnQualifiers methods. Column discovery involves a full Table scan (with KeyOnlyFilter), using either a direct-scan option or a mapreduce option.
TheRepositoryAdminexport methods provide for creation of an external HBaseSchemaArchive (HSA) file (in XML format*) containing the complete metadata contents (i.e., all Namespace, Table, Column Family, ColumnAuditor, and ColumnDefinition metadata) of either the entire Repository or the user-specified Namespace or Table. Conversely, theRepositoryAdminimport methods provide for deserialization of a designated HSA file and importation of its components into HBase (creating any Namespaces or Tables not already found in HBase).
*An HSA file adheres to the XML Schema layout in HBaseSchemaArchive.xsd.xml. Note: Consistent with the HBase project's usage of XML, HBaseSchemaArchive XML documents are not defined within a specific XML-namespace. In the context of XML processing in this package, the requirement that a non-default XML-namespace be specified would seem to offer no obvious benefit.
By default, the Audit Trail subsystem (as outlined in the subsection
"Get audit trail metadata" above) is configured to track and report on
only the most recent 50 ChangeEvents of each entity-attribute that it tracks (for
example, the most recent 50 changes to the "durability" setting of a given
Table). This limitation relates directly to the default "maxVersions" setting of the
Column Family of the Repository Table. This setting may be changed through
invocation of the static method
RepositoryAdmin#setRepositoryMaxVersions.
The UtilityRunner facility is provided for direct command-line invocation of a subset of administrative functions. It allows invocation of these functions without the need to perform installation or configuration of the full package. The following administrative functions are available via UtilityRunner:TO USE -- DOWNLOAD THE ColumnManager JAR AND INVOKE DESIRED FUNCTIONS:
- exportSchema: invokes the RepositoryAdmin#exportSchema method for a specified Table or Namespace.
- getChangeEvents: invokes one of the ChangeEventMonitor#getChangeEvents* methods for a specified Table or Namespace.
- getColumnQualifiers: invokes Column Qualifier discovery for the specified Table or Namespace, and then invokes the RepositoryAdmin#outputReportOnColumnQualifiers method to output the results to the specified file.
- getColumnQualifiersViaMapReduce: performs the same tasks as the
getColumnQualifiersfunction outlined above, but uses mapreduce to perform Column Qualifier discovery.- importSchema: invokes the RepositoryAdmin#importSchema method for a specified Table or Namespace.
- uninstallRepository: invokes the RepositoryAdmin#uninstallRepositoryStructures method to remove the Repository Table and any subsidiary HBase artifacts which are generated in execution of any of the above functions.
- The
JARfile corresponding to your currently-installed version of HBase must be downloaded from Github or from the Maven Central Repository. (For example,hbase-column-manager-1.0.3-beta-02.jarwould be used with an HBase 1.0.3 installation.)- Command-line invocation of UtilityRunner functions may then be performed from within the directory containing the ColumnManager JAR file, as outlined in the following usage instructions, which are outputted by the UtilityRunner's help function:
==================== usage: java [-options] -cp <hbase-classpath-entries> org.commonvox.hbase_column_manager.UtilityRunner -u <arg> -t <arg> -f <arg> [-h] *** Note that <hbase-classpath-entries> must include *** $HBASE_HOME/lib/*:$HBASE_HOME/conf, where $HBASE_HOME *** is the path to the local HBase installation. Arguments for ColumnManagerAPI UtilityRunner: ==================== -u,--utility <arg> Utility to run. Valid <arg> values are as follows: exportSchema, getChangeEventsForTable, getColumnQualifiers, getColumnQualifiersViaMapReduce, importSchema, uninstallRepository -t,--table <arg> Fully-qualified table name; or submit '*' in place of table qualifier (e.g., 'myNamespace:*') to process all tables in a given namespace. -f,--file <arg> Source/target file. -h,--help Display this help message. ==================== FOR EXAMPLE, the exportSchema function might be invoked as follows from within the directory containing the ColumnManager JAR file: java -cp *:$HBASE_HOME/lib/*:$HBASE_HOME/conf org.commonvox.hbase_column_manager.UtilityRunner -u exportSchema -t myNamespace:myTable -f myOutputFile.xml
Copyright © 2016. All rights reserved.