Cascalog provides various helper operators in the `cascalog.ops`

namespace. It is common to `(:require [cascalog.ops :as c])`

within your namespace declaration. As such, you will frequently see such references as `c/each`

or `c/sum`

in sample code and documentation.

A filter that is equivalent to boolean AND operator such that every function in the parameter must return true for the tuple to be kept. Can take multiple functions and multiple input fields.

```
(<- [!a !b]
(nums !a !b)
((c/all #'even? #'big?) !a !b)))
```

A filter such that it returns true if any of the passed in function returns true.

```
(<- [!a !b]
(nums !a !b)
((c/any #'even? #'big?) !a !b)))
```

Average.

```
(<- [?avg]
(src ?user ?cnt)
(c/avg ?cnt :> ?avg)))
```

Composition of functions. Executes function from right to left.

```
(<- [!y]
(nums !x)
((c/comp #'double #'exp) !x :> !y)))
```

is equivalent to:

```
(<- [!y]
(nums !x)
(#'exp !x :> !x1)
(#'double !x1 :> !y))
```

`!count`

takes in one input variable. Null values are interpreted as “0” and non-null values are interpreted as “1”. !count returns the sum of those interpreted values. `!count`

counts the number of *non-null values* for that variable.

```
(<- [?count]
(source !val)
(c/!count !val :> ?count))
```

Similar to `!count`

, but count values regardless whether they are null or not.

```
(<- [?count]
(source !val)
(c/count !val :> ?count))
```

Similar to `count`

, but only count distinct items. Null values would be counted as one.

```
(<- [?count]
(source !val)
(c/distinct-count !val :> ?count))
```

Apply the specified function to each of the input variable. Number of inputs must equal number of output fields if the function is expected to return a value, otherwise there is no output variables.

```
((c/each #'double) ?a ?b ?c :> ?x ?y ?z)
```

Would apply `double`

to `?a :> ?x`

, `?b :> ?y`

, and `?c :> ?z`

.

Returns a subquery getting the first n elements. Can pass in sorting arguments.

Say `wordcount-tap`

is a subquery with fields `[?word ?count]`

and we want to pull the top 100 words by count. Here’s how we do that with `first-n`

:

```
(defn top-100 [file-path]
(c/first-n (wordcount-tap file-path)
100
:sort ["?count"]
:reverse true))
(defmain Top100 [tuple-path results-path]
(?- (hfs-textline results-path)
(top-100 tuple-path)))
```

An efficient buffer that does most work in mappers to return the top N tuples.

Some examples using the playground:

Get the top 3 integers:

```
(?<- (stdout) [?n-out]
(integer ?n) (:sort ?n) (:reverse true)
((c/limit 3) ?n :> ?n-out))
```

Get at most one friend for each person:

```
(?<- (stdout) [?p ?f-out]
(follows ?p ?p2)
((c/limit 1) ?p2 :> ?f-out))
```

Get 5 follows relationships:

```
(?<- (stdout) [?p-out ?p2-out]
(follows ?p ?p2) ((c/limit 5) ?p ?p2 :> ?p-out ?p2-out))
```

Similar to `limit`

but also emit the “rank” of each item (useful when sorting):

Get the top 3 integers with rank:

```
(?<- (stdout) [?n-out ?r]
(integer ?n)
(:sort ?n) (:reverse true)
((c/limit-rank 3) ?n :> ?n-out ?r))
```

Let us know what was unclear or what has not been covered. Maybe you do not like the guide style or grammar or discover spelling mistakes. Reader feedback is key to making the documentation better.

This documentation site is open source and we welcome pull requests.