POKI_PUT_TOC_HERE

CSV/TSV/etc.

When mlr is invoked with the --csv or --csvlite option, key names are found on the first record and values are taken from subsequent records. This includes the case of CSV-formatted files. See POKI_PUT_LINK_FOR_PAGE(record-heterogeneity.html)HERE for how Miller handles changes of field names within a single data stream.

Miller has record separator RS and field separator FS, just as awk does. For TSV, use --fs tab; to convert TSV to CSV, use --ifs tab --ofs comma, etc. (See also POKI_PUT_LINK_FOR_PAGE(reference.html)HERE.)

Miller’s --csv flag supports RFC-4180 CSV ( https://tools.ietf.org/html/rfc4180). This includes CRLF line-terminators by default, regardless of platform. You can use mlr --csv --rs lf for native Un*x (LF-terminated) CSV files.

The RFC says, somewhat briefly, that “there may be a header line”. Miller’s --implicit-csv-header option allows you to read CSV data which lacks a header line, applying column labels 1, 2, 3, etc. for you. You may also use Miller’s label to replace those numerical column names with labels of your choosing.

Pretty-printed

Miller’s pretty-print format is like CSV, but column-aligned. For example, compare
POKI_RUN_COMMAND{{mlr --ocsv cat data/small}}HERE POKI_RUN_COMMAND{{mlr --opprint cat data/small}}HERE
Note that while Miller is a line-at-a-time processor and retains input lines in memory only where necessary (e.g. for sort), pretty-print output requires it to accumulate all input lines (so that it can compute maximum column widths) before producing any output. This has two consequences: (a) pretty-print output won’t work on tail -f contexts, where Miller will be waiting for an end-of-file marker which never arrives; (b) pretty-print output for large files is constrained by available machine memory.

See POKI_PUT_LINK_FOR_PAGE(record-heterogeneity.html)HERE for how Miller handles changes of field names within a single data stream.

Key-value pairs

Miller’s default file format is DKVP, for delimited key-value pairs. Example: POKI_RUN_COMMAND{{mlr cat data/small}}HERE Such data are easy to generate, e.g. in Ruby with POKI_CARDIFY(puts "host=#{hostname},seconds=#{t2-t1},message=#{msg}")HERE POKI_CARDIFY(puts mymap.collect{|k,v| "#{k}=#{v}"}.join(','))HERE or print statements in various languages, e.g. POKI_CARDIFY(echo "type=3,user=$USER,date=$date\n";)HERE POKI_CARDIFY(logger.log("type=3,user=$USER,date=$date\n");)HERE

As discussed in POKI_PUT_LINK_FOR_PAGE(record-heterogeneity.html)HERE, Miller handles changes of field names within the same data stream. But using DKVP format this is particularly natural. One of my favorite use-cases for Miller is in application/server logs, where I log all sorts of lines such as

resource=/path/to/file,loadsec=0.45,ok=true
record_count=100, resource=/path/to/file
resource=/some/other/path,loadsec=0.97,ok=false

etc. and I just log them as needed. Then later, I can use grep, mlr --opprint group-like, etc. to analyze my logs.

See POKI_PUT_LINK_FOR_PAGE(reference.html)HERE regarding how to specify separators other than the default equals-sign and comma.

Index-numbered (toolkit style)

With --inidx --ifs ' ' --repifs, Miller splits lines on whitespace and assigns integer field names starting with 1. This recapitulates Unix-toolkit behavior.

Example with index-numbered output:
POKI_RUN_COMMAND{{cat data/small}}HERE POKI_RUN_COMMAND{{mlr --onidx --ofs ' ' cat data/small}}HERE

Example with index-numbered input:
POKI_RUN_COMMAND{{cat data/mydata.txt}}HERE POKI_RUN_COMMAND{{mlr --inidx --ifs ' ' --odkvp cat data/mydata.txt}}HERE

Example with index-numbered input and output:
POKI_RUN_COMMAND{{cat data/mydata.txt}}HERE POKI_RUN_COMMAND{{mlr --nidx --fs ' ' --repifs cut -f 2,3 data/mydata.txt}}HERE

Vertical tabular

This is perhaps most useful for looking a very wide and/or multi-column data which causes line-wraps on the screen (but see also https://github.com/twosigma/ngrid for an entirely different, very powerful option). Namely:
POKI_INCLUDE_ESCAPED(system-file-opprint-example.txt)HERE
POKI_INCLUDE_ESCAPED(system-file-oxtab-example.txt)HERE