You can use Cascading Traps with Cascalog to capture tuples whose processing fails. To store those tuples into a sink tap (for example a local file or hfs-textline), use the :trap
keyword with an error sink:
(def errors (lfs-textline "file:///tmp/people.bad_records" :sinkmode :replace))
;; or (stdout) or (hfs-textline "hdfs:///tmp/...") if running on Hadoop
(<- [?name ?age]
(people ?name ?age)
(:trap errors)
(< ?age 40))
You may use the functions and macros from the midje.cascalog namespace together with clojure.test test your queries. See Cascalog’s own tests for examples.
It uses for example fact?-
to execute a query and compare its outputs with the expected ones or something like (facts query => (produces [[3 10] [1 5] [5 11]])
where (def query (<- ...))
. Read Sam Ritchie’s blog post Cascalog Testing 2.0 for more details and examples of midje-cascalog 0.4.0.
There are certain features that support live, interactive coding:
(def people [["ben" 21] ["jim" 42]])
)(defaggregateop
with (defn
.Let us know what was unclear or what has not been covered. Maybe you do not like the guide style or grammar or discover spelling mistakes. Reader feedback is key to making the documentation better.
This documentation site is open source and we welcome pull requests.