Thursday, May 4, 2017

Securing Apache Hadoop Distributed File System (HDFS) - part V

This is the fifth in a series of blog posts on securing HDFS. The first post described how to install Apache Hadoop, and how to use POSIX permissions and ACLs to restrict access to data stored in HDFS. The second post looked at how to use Apache Ranger to authorize access to data stored in HDFS. The third post looked at how Apache Ranger can create "tag" based authorization policies for HDFS using Apache Atlas. The fourth post looked at how to implement transparent encryption for HDFS using Apache Ranger. Up to now, we have not shown how to authenticate users, concentrating only on authorizing local access to HDFS. In this post we will show how to configure HDFS to authenticate users via Kerberos.

1) Set up a KDC using Apache Kerby

If we are going to configure Apache Hadoop to use Kerberos to authenticate users, then we need a Kerberos Key Distribution Center (KDC). Typically most documentation revolves around installing the MIT Kerberos server, adding principals, and creating keytabs etc. However, in this post we will show a simpler way of getting started by using a pre-configured maven project that uses Apache Kerby. Apache Kerby is a subproject of the Apache Directory project, and is a complete open-source KDC written entirely in Java.

A github project that uses Apache Kerby to start up a KDC is available here:
  • bigdata-kerberos-deployment: This project contains some tests which can be used to test kerberos with various big data deployments, such as Apache Hadoop etc.
The KDC is a simple junit test that is available here. To run it just comment out the "org.junit.Ignore" annotation on the test method. It uses Apache Kerby to define the following principals:
  • alice@hadoop.apache.org
  • bob@hadoop.apache.org
  • hdfs/localhost@hadoop.apache.org
  • HTTP/localhost@hadoop.apache.org
Keytabs are created in the "target" folder for "alice", "bob" and "hdfs" (where the latter has both the hdfs/localhost + HTTP/localhost principals included). Kerby is configured to use a random port to lauch the KDC each time, and it will create a "krb5.conf" file containing the random port number in the target directory. So all we need to do is to point Hadoop to the keytabs that were generated and the krb5.conf, and it should be able to communicate correctly with the Kerby-based KDC.

2) Configure Hadoop to authenticate users via Kerberos

Download and configure Apache Hadoop as per the first tutorial. For now, we will not enable the Ranger authorization plugin, but rather secure access to the "/data" directory using ACLs, as described in section (3) of the first tutorial, such that "alice" has permission to read the file stored in "/data" but "bob" does not. The next step is to configure Hadoop to authenticate users via Kerberos.

Edit 'etc/hadoop/core-site.xml' and adding the following property name/values:
  • hadoop.security.authentication: kerberos
Next edit 'etc/hadoop/hdfs-site.xml' and add the following property name/values to configure Kerberos for the namenode:
  • dfs.namenode.keytab.file: Path to Kerby hdfs.keytab (see above).
  • dfs.namenode.kerberos.principal: hdfs/localhost@hadoop.apache.org
  • dfs.namenode.kerberos.internal.spnego.principal: HTTP/localhost@hadoop.apache.org
Add the exact same property name/values for the secondary namenode, except using the property name "secondary.namenode" instead of "namenode". We also need to configure Kerberos for the datanode:
  • dfs.datanode.data.dir.perm: 700
  • dfs.datanode.address: 0.0.0.0:1004
  • dfs.datanode.http.address: 0.0.0.0:1006
  • dfs.web.authentication.kerberos.principal: HTTP/localhost@hadoop.apache.org
  • dfs.datanode.keytab.file: Path to Kerby hdfs.keytab (see above).
  • dfs.datanode.kerberos.principal: hdfs/localhost@hadoop.apache.org
  • dfs.block.access.token.enable: true 
As we are not using SASL to secure the the data transfer protocol (see here), we need to download and configure JSVC into JSVC_HOME. Then edit 'etc/hadoop/hadoop-env.sh' and add the following properties:
  • export HADOOP_SECURE_DN_USER=(the user you are running HDFS as)
  • export JSVC_HOME=(path to JSVC as above)
  • export HADOOP_OPTS="-Djava.security.krb5.conf=<path to Kerby target/krb5.conf"
You also need to make sure that you can ssh to localhost as "root" without specifying a password.

3) Launch Kerby and HDFS and test authorization

Now that we have hopefully configured everything correctly it's time to launch the Kerby based KDC and HDFS. Start Kerby by running the JUnit test as described in the first section. Now start HDFS via:
  • sbin/start-dfs.sh
  • sudo sbin/start-secure-dns.sh
Now let's try to read the file in "/data" using "bin/hadoop fs -cat /data/LICENSE.txt". You should see an exception as we have no credentials. Let's try to read as "alice" now:
  • export KRB5_CONFIG=/pathtokerby/target/krb5.conf
  • kinit -k -t /pathtokerby/target/alice.keytab alice
  • bin/hadoop fs -cat /data/LICENSE.txt
This should be successful. However the following should result in a "Permission denied" message:
  • kdestroy
  • kinit -k -t /pathtokerby/target/bob.keytab bob
  • bin/hadoop fs -cat /data/LICENSE.txt

No comments:

Post a Comment