Cascalog documentation is organized as a number of guides, covering all kinds of topics.
We recommend that you read these guides, if possible, in this order:
An overview of Cascalog with a quick tutorial that helps you to get started with it. It should take about 30 minutes to read and try the provided code examples
Running on a cluster
- Developing and deploying a Cascalog query on a Hadoop cluster
Testing and Debugging
- Testing Cascalog with Midje, part 1
- Testing Cascalog with Midje, part 2
Upgrading from 1.x to 2.x
Cascalog for the Impatient
- This guide is a set of progressive coding examples that start with a simple file copy and builds up to a MapReduce implementation of the TF-IDF algorithm.
Real Code Examples
- Cascading plus City of Palo Alto open data
- Forest Monitoring for Action project
- CDEC Open Health Data Platform
Blog posts from around the web
- Why Yieldbot chose Cascalog over Pig for Hadoop processing
- Next-gen sequencing variation statistics with Hadoop using Cascalog
- Cascalog made easier
- Using Cascalog for ETL
- Hardcore Cascalog: Dynamic Queries
Help improve this site
Let us know what was unclear or what has not been covered. Maybe you do not like the guide style or grammar or discover spelling mistakes. Reader feedback is key to making the documentation better.
This documentation site is open source and we welcome pull requests.