POKI_PUT_TOC_HERE

Data

Test data were of the form
POKI_INCLUDE_ESCAPED(./data/small)HERE POKI_INCLUDE_ESCAPED(./data/small.csv)HERE
for DKVP and CSV, respectively, where fields a and b take one of five text values, uniformly distributed; i is a 1-up line counter; x and y are independent uniformly distributed floating-point numbers in the unit interval.

Data files of one million lines (totalling about 50MB for CSV and 60MB for DKVP) were used. In experiments not shown here, I also varied the file sizes; the size-dependent results were the expected, completely unsurprising linearities and so I produced no file-size-dependent plots for your viewing pleasure.

Comparands

The cat, cut, awk, sed, sort tools were compared to mlr on an 8-core Darwin laptop; RAM capacity was nowhere near challenged . The catc program is a simple line-oriented line-printer (source here) which is intermediate between Miller (which is record-aware as well as line-aware) and cat (which is only byte-aware).

Raw results

Note that for CSV data, the command is mlr --csvlite ... rather than mlr .... POKI_INCLUDE_ESCAPED(perftbl.txt)HERE

Analysis

Conclusion

For record-oriented data transformations, Miller meets or beats the Unix toolkit in many contexts. Field renames in particular are worth doing as a pre-pipe or post-pipe using sed.