Recent blog posts have described how to set up authorization for Apache HBase using Apache Ranger. However the posts just covered inputing and reading data using the HBase Shell. In this post, we will show how Talend Open Studio for Big Data can be used to read data stored in Apache HBase. This post is along the same lines of other recent tutorials on reading data from Kafka and HDFS.
1) HBase setup
Follow this tutorial on setting up Apache HBase in standalone mode, and creating a 'data' table with some sample values using the HBase Shell.
2) Download Talend Open Studio for Big Data and create a job
Now we will download
Talend Open Studio for Big Data (6.4.0 was used for the purposes of
this tutorial). Unzip the file when it is downloaded and then start the
Studio using one of the platform-specific scripts. It will prompt you to
download some additional dependencies and to accept the licenses. Click
on "Create a new job" called "HBaseRead". In the search bar on the right-hand side, enter "hbase" and hit enter. Drag "tHBaseConnection" and "tHBaseInput" onto the palette, as well as "tLogRow".
Now let's configure the individual components. Double click on "tHBaseConnection" and select the distribution "Hortonworks" and Version "HDP V2.5.0" (from an earlier tutorial we are using HBase 1.2.6). We are not using Kerberos here so we can skip the rest of the security configuration. Now double click on "tHBaseInput". Select the "Use an existing connection" checkbox. Now hit "Edit Schema" and add two entries to map the column we created in two different column families: "c1" which matches DB "col1" of type String, and "c2" which matches DB "col1" of type String.
Select "data" for the table name back in tHBaseInput and add a mapping for "c1" to "colfam1", and "c2" to "colfam2".
Now we are ready to run the job. Click on the "Run" tab and then hit the "Run" button. You should see "val1" and "val2" appear in the console window.