POKI_PUT_TOC_HERE
Parsing log-file output
This, of course, depends highly on what’s in your log files. But, as
an example, suppose you have log-file lines such as
POKI_CARDIFY(2015-10-08 08:29:09,445 INFO com.company.path.to.ClassName @ [sometext] various/sorts/of data {& punctuation} hits=1 status=0 time=2.378)HERE
I prefer to pre-filter with grep and/or sed to extract the structured text, then hand that to Miller. Example:
POKI_CARDIFY(grep 'various sorts' *.log | sed 's/.*} //' | mlr --fs space --repifs --oxtab stats1 -a min,p10,p50,p90,max -f time -g status)HERE
Rectangularizing data
Suppose you have a method (in whatever language) which is printing things of the form
POKI_INCLUDE_ESCAPED(data/rect-outer.txt)HERE
and then calls another method which prints things of the form
POKI_INCLUDE_ESCAPED(data/rect-middle.txt)HERE
and then, perhaps, that second method calls a third method which prints things of the form
POKI_INCLUDE_ESCAPED(data/rect-inner.txt)HERE
with the result that your program’s output is
POKI_INCLUDE_ESCAPED(data/rect.txt)HERE
The idea here is that middles starting with a 1 belong to the outer value of 1,
and so on. (The outer values might be account IDs, or table names, or what
have you; the middle values might be invoiced IDs; the inner values might be
line-items.) If you want all the middle and inner lines to have the context of
which outers they belong to, you can modify your software to pass all those
through your methods. Alternatively, you can use the following to
rectangularize the data. The idea is to use an out-of-stream variable to accumulate
fields across records. Clear that variable when you see an outer ID; accumulate
fields; emit output when you see the inner IDs.
POKI_INCLUDE_AND_RUN_ESCAPED(data/rect.sh)HERE
Bulk rename of field names
POKI_RUN_COMMAND{{cat data/spaces.csv}}HERE
POKI_RUN_COMMAND{{mlr --csv --rs lf rename -r -g ' ,_' data/spaces.csv}}HERE
POKI_RUN_COMMAND{{mlr --csv --irs lf --opprint rename -r -g ' ,_' data/spaces.csv}}HERE
You can also do this with a for-loop but it puts the modified fields after the unmodified fields:
POKI_RUN_COMMAND{{mlr --icsv --irs lf --opprint put -f data/bulk-rename-for-loop.mlr data/spaces.csv}}HERE
Regularizing ragged CSV
Miller handles compliant CSV: in particular, it’s an error if the
number of data fields in a given data line don’t match the number of
header lines. But in the event that you have a CSV file in which some lines
have less than the full number of fields, you can use Miller to pad them out.
The trick is to use NIDX format, for which each line stands on its own without
respect to a header line.
POKI_RUN_COMMAND{{cat data/ragged.csv}}HERE
POKI_INCLUDE_AND_RUN_ESCAPED(data/ragged-csv.sh)HERE
or, more simply,
POKI_INCLUDE_AND_RUN_ESCAPED(data/ragged-csv-2.sh)HERE
Filtering paragraphs of text
The idea is to use a record separator which is a pair of newlines. Then, if
you want each paragraph to be a record with a single value, use a
field-separator which isn’t present in the input data (e.g. a control-A
which is octal 001). Or, if you want each paragraph to have its lines as
separate values, use newline as field separator.
POKI_RUN_COMMAND{{cat paragraphs.txt}}HERE
POKI_RUN_COMMAND{{mlr --from paragraphs.txt --nidx --rs '\n\n' --fs '\001' filter '$1 =~ "the"'}}HERE
POKI_RUN_COMMAND{{mlr --from paragraphs.txt --nidx --rs '\n\n' --fs '\n' cut -f 1,3}}HERE
Doing arithmetic on fields with currency symbols
POKI_INCLUDE_ESCAPED(data/dollar-sign.txt)HERE
Program timing
This admittedly artificial example demonstrates using Miller time and stats
functions to introspectly acquire some information about Miller’s own
runtime. The delta function computes the difference between successive
timestamps.
POKI_INCLUDE_ESCAPED(data/timing-example.txt)HERE
Using out-of-stream variables
One of Miller’s strengths is its compact notation: for example, given input of the form
POKI_RUN_COMMAND{{head -n 5 ../data/medium}}HERE
you can simply do
POKI_RUN_COMMAND{{mlr --oxtab stats1 -a sum -f x ../data/medium}}HERE
or
POKI_RUN_COMMAND{{mlr --opprint stats1 -a sum -f x -g b ../data/medium}}HERE
rather than the more tedious
POKI_INCLUDE_AND_RUN_ESCAPED(oosvar-example-sum.sh)HERE
or
POKI_INCLUDE_AND_RUN_ESCAPED(oosvar-example-sum-grouped.sh)HERE
The former (mlr stats1 et al.) has the advantages of being easier
to type, being less error-prone to type, and running faster.
Nonetheless, out-of-stream variables (which I whimsically call
oosvars), begin/end blocks, and emit statements give you the ability to
implement logic — if you wish to do so — which isn’t present
in other Miller verbs. (If you find yourself often using the same
out-of-stream-variable logic over and over, please file a request at https://github.com/johnkerl/miller/issues
to get it implemented directly in C as a Miller verb of its own.)
The following examples compute some things using oosvars which are already
computable using Miller verbs, by way of providing food for thought.
Mean without/with oosvars
POKI_RUN_COMMAND{{mlr --opprint stats1 -a mean -f x data/medium}}HERE
POKI_INCLUDE_AND_RUN_ESCAPED(data/mean-with-oosvars.sh)HERE
Keyed mean without/with oosvars
POKI_RUN_COMMAND{{mlr --opprint stats1 -a mean -f x -g a,b data/medium}}HERE
POKI_INCLUDE_AND_RUN_ESCAPED(data/keyed-mean-with-oosvars.sh)HERE
Variance and standard deviation without/with oosvars
POKI_RUN_COMMAND{{mlr --oxtab stats1 -a count,sum,mean,var,stddev -f x data/medium}}HERE
POKI_RUN_COMMAND{{cat variance.mlr}}HERE
POKI_RUN_COMMAND{{mlr --oxtab put -q -f variance.mlr data/medium}}HERE
You can also do this keyed, of course, imitating the keyed-mean example above.
Min/max without/with oosvars
POKI_RUN_COMMAND{{mlr --oxtab stats1 -a min,max -f x data/medium}}HERE
POKI_RUN_COMMAND{{mlr --oxtab put -q '@x_min = min(@x_min, $x); @x_max = max(@x_max, $x); end{emitf @x_min, @x_max}' data/medium}}HERE
Keyed min/max without/with oosvars
POKI_RUN_COMMAND{{mlr --opprint stats1 -a min,max -f x -g a data/medium}}HERE
POKI_INCLUDE_AND_RUN_ESCAPED(data/keyed-min-max-with-oosvars.sh)HERE
Alternatively:
POKI_INCLUDE_AND_RUN_ESCAPED(data/keyed-min-max-with-oosvars-2.sh)HERE
Delta without/with oosvars
POKI_RUN_COMMAND{{mlr --opprint step -a delta -f x data/small}}HERE
POKI_RUN_COMMAND{{mlr --opprint put '$x_delta = ispresent(@last) ? $x - @last : 0; @last = $x' data/small}}HERE
Keyed delta without/with oosvars
POKI_RUN_COMMAND{{mlr --opprint step -a delta -f x -g a data/small}}HERE
POKI_RUN_COMMAND{{mlr --opprint put '$x_delta = ispresent(@last[$a]) ? $x - @last[$a] : 0; @last[$a]=$x' data/small}}HERE
Exponentially weighted moving averages without/with oosvars
POKI_INCLUDE_AND_RUN_ESCAPED(verb-example-ewma.sh)HERE
POKI_INCLUDE_AND_RUN_ESCAPED(oosvar-example-ewma.sh)HERE