Friday, July 28, 2017

Third party SSO support for Apache Syncope REST services

A recent blog post covered SSO support for Apache Syncope REST services. This was a new feature added in the 2.0.3 release, which allows a user to obtain a JWT from the Syncope "accessTokens/login" REST endpoint. This token can then be used to repeatedly invoke on a Syncope REST service. However, what if you wish to allow a user invoke on a Syncope REST service using a (JWT) token issued by a third party IdP instead? From Syncope 2.0.5 this will be possible.

In this post we will cover how to use a JWT issued by a third-party to invoke on an Apache Syncope REST service. The code is available on github here:
  • cxf-syncope2-webapp: A pre-configured web application of the Syncope core for use in the tests.
  • cxf-syncope2: Some integration tests that use cxf-syncope2-webapp for authentication and authorization purposes. JWTTestIT illustrates third party SSO integration with Syncope as covered in this post.
1) Configuring Apache Syncope to accept third-party JWTs

Naturally, if we invoke on an Apache Syncope REST service using an arbitrary third-party token, access will be denied as Syncope will not be able to validate the signature on the token correctly. By default, Syncope uses the following properties defined in '' to both issue and validate signed tokens:
  • jwtIssuer: The issuer of the token
  • jwsKey: The Hex-encoded (symmetric) verification key
The default signature algorithm is the symmetric algorithm HS512. To allow third-party tokens we need to implement the JWTSSOProvider interface provided in Syncope. By default, Syncope searches for JWTSSOProvider implementations on the classpath under the package name "org.apache.syncope.core", so no explicit configuration changes are required to plug in a custom JWTSSOProvider implementation.

When Syncope receives a signed JWT it will query which of the configured JWTSSOProvider implementations can verify the token, by matching the 'getIssuer()' method to the issuer of the token. The 'getAlgorithm()' method should match the signature algorithm of the received token. The 'verify' method should validate the signature of the received token. The implementation used in the tests is available here. A keystore is read in and the certificate contained in it is used to verify the signature on the received token. 

One final interesting point is that we need to map the authenticated JWT subject to a user in Syncope somehow. This is done in the JWTSSOProvider implementation via the 'resolve' method. In our test implementation, we map the JWT subject directly to a Syncope username.

2) Obtain a JWT from the Apache CXF STS using REST

Now that we have set up Apache Syncope to allow third-party JWTs, we need to obtain such a token to get our test-case to work. We will use the Apache CXF Security Token Service (STS) to obtain a JWT. For simplicity we will leverage the REST interface of the CXF STS, which allows us to obtain a token with a simple REST call. The STS is configured via spring to issue signed JWTs. User authentication to the STS is enforced via basic authentication. In the test code, we use the CXF WebClient to invoke on the STS and to get a JWT back:

Now we can use this token with the Syncope client API to call the user "self service" successfully:

Thursday, July 20, 2017

Securing Apache Hive - part I

This is the first post in a series of articles on securing Apache Hive. In this article we will look at installing Apache Hive and doing some queries on data stored in HDFS. We will not consider any security requirements in this post, but the test deployment will be used by future posts in this series on authenticating and authorizing access to Hive.

1) Install and configure Apache Hadoop

The first step is to install and configure Apache Hadoop. Please follow section 1 of this earlier tutorial for information on how to do this. In addition, we need to configure two extra properties in 'etc/hadoop/core-site.xml':
  • hadoop.proxyuser.$user.groups: *
  • hadoop.proxyuser.$user.hosts: localhost
where "$user" above should be replaced with the user that is going to run the hive server below. As we are not using authentication in this tutorial, this allows the $user to impersonate the "anonymous" user, who will connect to Hive via beeline and run some queries.

Once HDFS has started, we need to create some directories for use by Apache Hive, and change the permissions appropriately:
  • bin/hadoop fs -mkdir -p /user/hive/warehouse /tmp
  • bin/hadoop fs -chmod g+w /user/hive/warehouse /tmp
  • bin/hadoop fs -mkdir /data
The "/data" directory will hold a file which represents the output of a map-reduce job. For the purposes of this tutorial, we will use a sample output of the canonical "Word Count" map-reduce job on some text. The file consists of two columns separated by a tab character, where the left column is the word, and the right column is the total count associated with that word in the original document.

I've uploaded such a sample output here. Download it and upload it to the HDFS data directory:
  • bin/hadoop fs -put output.txt /data
2) Install and configure Apache Hive

Now we will install and configure Apache Hive. Download and extract Apache Hive (2.1.1 was used for the purposes of this tutorial). Set the "HADOOP_HOME" environment variable to point to the Apache Hadoop installation directory above. Now we will configure the metastore and start Hiveserver2:
  • bin/schematool -dbType derby -initSchema
  • bin/hiveserver2
In a separate window, we will start beeline to connect to the hive server, where $user is the user who is running Hadoop (necessary as we are going to create some data in HDFS, and otherwise wouldn't have the correct permissions):
  • bin/beeline -u jdbc:hive2://localhost:10000 -n $user
Once we are connected, then create a Hive table and load the map reduce output data into a new table called "words":
  • create table words (word STRING, count INT) row format delimited fields terminated by '\t' stored as textfile;
  • LOAD DATA INPATH '/data/output.txt' INTO TABLE words;
Now we can run some queries on the data as the anonymous user. Log out of beeline and then back in and run some queries via:
  • bin/beeline -u jdbc:hive2://localhost:10000
  • select * from words where word == 'Dare';