POKI_PUT_TOC_HERE

Examples

POKI_RUN_COMMAND{{mlr --usage-data-format-examples}}HERE

DKVP: Key-value pairs

Miller’s default file format is DKVP, for delimited key-value pairs. Example: POKI_RUN_COMMAND{{mlr cat data/small}}HERE Such data are easy to generate, e.g. in Ruby with POKI_CARDIFY(puts "host=#{hostname},seconds=#{t2-t1},message=#{msg}")HERE POKI_CARDIFY{{puts mymap.collect{|k,v| "#{k}=#{v}"}.join(',')}}HERE or print statements in various languages, e.g. POKI_CARDIFY(echo "type=3,user=$USER,date=$date\n";)HERE POKI_CARDIFY{{logger.log("type=3,user=$USER,date=$date\n");}}HERE

Fields lacking an IPS will have positional index (starting at 1) used as the key, as in NIDX format. For example, dish=7,egg=8,flint is parsed as "dish" => "7", "egg" => "8", "3" => "flint" and dish,egg,flint is parsed as "1" => "dish", "2" => "egg", "3" => "flint".

As discussed in POKI_PUT_LINK_FOR_PAGE(record-heterogeneity.html)HERE, Miller handles changes of field names within the same data stream. But using DKVP format this is particularly natural. One of my favorite use-cases for Miller is in application/server logs, where I log all sorts of lines such as

resource=/path/to/file,loadsec=0.45,ok=true
record_count=100, resource=/path/to/file
resource=/some/other/path,loadsec=0.97,ok=false

etc. and I just log them as needed. Then later, I can use grep, mlr --opprint group-like, etc. to analyze my logs.

See POKI_PUT_LINK_FOR_PAGE(reference.html)HERE regarding how to specify separators other than the default equals-sign and comma.

NIDX: Index-numbered (toolkit style)

With --inidx --ifs ' ' --repifs, Miller splits lines on whitespace and assigns integer field names starting with 1. This recapitulates Unix-toolkit behavior.

Example with index-numbered output:
POKI_RUN_COMMAND{{cat data/small}}HERE POKI_RUN_COMMAND{{mlr --onidx --ofs ' ' cat data/small}}HERE

Example with index-numbered input:
POKI_RUN_COMMAND{{cat data/mydata.txt}}HERE POKI_RUN_COMMAND{{mlr --inidx --ifs ' ' --odkvp cat data/mydata.txt}}HERE

Example with index-numbered input and output:
POKI_RUN_COMMAND{{cat data/mydata.txt}}HERE POKI_RUN_COMMAND{{mlr --nidx --fs ' ' --repifs cut -f 2,3 data/mydata.txt}}HERE

CSV/TSV/etc.

When mlr is invoked with the --csv or --csvlite option, key names are found on the first record and values are taken from subsequent records. This includes the case of CSV-formatted files. See POKI_PUT_LINK_FOR_PAGE(record-heterogeneity.html)HERE for how Miller handles changes of field names within a single data stream.

Miller has record separator RS and field separator FS, just as awk does. For TSV, use --fs tab; to convert TSV to CSV, use --ifs tab --ofs comma, etc. (See also POKI_PUT_LINK_FOR_PAGE(reference.html)HERE.)

Miller’s --csv flag supports RFC-4180 CSV ( https://tools.ietf.org/html/rfc4180). This includes CRLF line-terminators by default, regardless of platform.

Please use mlr --csv --rs lf for native Un*x (linefeed-terminated) CSV files.

The RFC says, somewhat briefly, that “there may be a header line”. Miller’s --implicit-csv-header option allows you to read CSV data which lacks a header line, applying column labels 1, 2, 3, etc. for you. You may also use Miller’s label to replace those numerical column names with labels of your choosing.

Here are the differences between CSV and CSV-lite:

Here are things they have in common:

PPRINT: Pretty-printed tabular

Miller’s pretty-print format is like CSV, but column-aligned. For example, compare
POKI_RUN_COMMAND{{mlr --ocsv cat data/small}}HERE POKI_RUN_COMMAND{{mlr --opprint cat data/small}}HERE
Note that while Miller is a line-at-a-time processor and retains input lines in memory only where necessary (e.g. for sort), pretty-print output requires it to accumulate all input lines (so that it can compute maximum column widths) before producing any output. This has two consequences: (a) pretty-print output won’t work on tail -f contexts, where Miller will be waiting for an end-of-file marker which never arrives; (b) pretty-print output for large files is constrained by available machine memory.

See POKI_PUT_LINK_FOR_PAGE(record-heterogeneity.html)HERE for how Miller handles changes of field names within a single data stream.

XTAB: Vertical tabular

This is perhaps most useful for looking a very wide and/or multi-column data which causes line-wraps on the screen (but see also https://github.com/twosigma/ngrid for an entirely different, very powerful option). Namely:
POKI_INCLUDE_ESCAPED(system-file-opprint-example.txt)HERE
POKI_INCLUDE_ESCAPED(system-file-oxtab-example.txt)HERE