Cascalog and Hadoop Security

If your cluster features Hadoop Security your queries may run into exceptions like this one:

org.apache.hadoop.ipc.RemoteException: token (...) can't be found in cache

That exception fails the second step in any multi-step Cascalog (or Cascading for that regard) query. Reason is, the Kerberos token gets cancelled after the first step succeeded.

A solution to this is to configure JobConf with mapreduce.job.complete.cancel.delegation.tokens set to false, like so:

    (with-job-conf {"mapreduce.job.complete.cancel.delegation.tokens" false}
      ...)

Or add it to your job-conf.clj.

Also, if you happen to schedule your Cascalog jobs via Oozie, you may want to google for HADOOP_TOKEN_FILE_LOCATION and mapreduce.job.credentials.binary and set your jobconf accordingly.

Resources

  • Owen O’Malley: Motivations for Apache Hadoop Security
  • Owen O’Malley: Hadoop Security in Detail (video)
  • Cloudera CDH3 Documentation: Introduction to Hadoop Security
  • MAPREDUCE-1430
  • MAPREDUCE-4324

Help improve this site

Let us know what was unclear or what has not been covered. Maybe you do not like the guide style or grammar or discover spelling mistakes. Reader feedback is key to making the documentation better.

This documentation site is open source and we welcome pull requests.