• About Miller • File formats • Miller features in the context of the Unix toolkit • Record-heterogeneity • Reference • Data examples • FAQ • Internationalization • Compiling, portability, dependencies, and testing • Performance • Why C? • Why call it Miller? • How original is Miller? • Things to do • Contact information • GitHub repo |
• Fields not selected • Error-output in certain string cases • How do I parse log-file output? • How do I examine then-chaining? • How do I do arithmetic on fields with currency symbols? No output at allCheck the line-terminators of the data, e.g. with the command-line file program. Example: for CSV, Miller’s default line terminator is CR/LF (carriage return followed by linefeed, following RFC4180). Yet if your CSV has *nix-standard LF line endings, Miller will keep reading the file looking for a CR/LF which never appears. Solution in this case: tell Miller the input has LF line-terminator, e.g. mlr --csv --rs lf {remaining arguments ...}.Fields not selectedCheck the field-separators of the data, e.g. with the command-line head program. Example: for CSV, Miller’s default record separator is comma; if your data is tab-delimited, e.g. aTABbTABc, then Miller won’t find three fields named a, b, and c but rather just one named aTABbTABc. Solution in this case: mlr --fs tab {remaining arguments ...}.Error-output in certain string casesmlr put '$y = string($x); $z=$y.$y' gives (error) on numeric data such as x=123 while mlr put '$z=string($x).string($x)' does not. This is because in the former case y is computed and stored as a string, then re-parsed as an integer, for which string-concatenation is an invalid operator.How do I parse log-file output?Suppose you have log-file lines such as2015-10-08 08:29:09,445 INFO com.company.path.to.ClassName @ [sometext] various/sorts/of data {& punctuation} hits=1 status=0 time=2.378 grep 'various sorts' *.log | sed 's/.*} //' | mlr --fs space --repifs --oxtab stats1 -a min,p10,p50,p90,max -f time -g status How do I examine then-chaining?Then-chaining found in Miller is intended to function the same as Unix pipes. You can print your data one pipeline step at a time, to see what intermediate output at one step becomes the input to the next step. First, review the input data:$ cat data/then-example.csv Status,Payment_Type,Amount paid,cash,10.00 pending,debit,20.00 paid,cash,50.00 pending,credit,40.00 paid,debit,30.00 $ mlr --icsv --rs lf --opprint count-distinct -f Status,Payment_Type data/then-example.csv Status Payment_Type count paid cash 2 pending debit 1 pending credit 1 paid debit 1 $ mlr --icsv --rs lf --opprint count-distinct -f Status,Payment_Type then sort -nr count data/then-example.csv Status Payment_Type count paid cash 2 pending debit 1 pending credit 1 paid debit 1 $ mlr --csv --rs lf count-distinct -f Status,Payment_Type data/then-example.csv | mlr --icsv --rs lf --opprint sort -nr count Status Payment_Type count paid cash 2 pending debit 1 pending credit 1 paid debit 1 How do I do arithmetic on fields with currency symbols?$ cat sample.csv EventOccurred,EventType,Description,Status,PaymentType,NameonAccount,TransactionNumber,Amount 10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,John,1,$230.36 10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Fred,2,$32.25 10/1/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Bob,3,$39.02 10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Alice,4,$57.54 10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Jungle,5,$230.36 10/1/2015,Charged Back,Reason: Payment Stopped,Disputed,Checking,Joe,6,$281.96 10/2/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Joseph,7,$188.19 10/2/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Joseph,8,$188.19 10/2/2015,Charged Back,Reason: Payment Stopped,Disputed,Checking,Anthony,9,$250.00 $ mlr --icsv --opprint cat sample.csv EventOccurred EventType Description Status PaymentType NameonAccount TransactionNumber Amount 10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking John 1 $230.36 10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking Fred 2 $32.25 10/1/2015 Charged Back Reason: Customer Advises Not Authorized Disputed Checking Bob 3 $39.02 10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking Alice 4 $57.54 10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking Jungle 5 $230.36 10/1/2015 Charged Back Reason: Payment Stopped Disputed Checking Joe 6 $281.96 10/2/2015 Charged Back Reason: Customer Advises Not Authorized Disputed Checking Joseph 7 $188.19 10/2/2015 Charged Back Reason: Customer Advises Not Authorized Disputed Checking Joseph 8 $188.19 10/2/2015 Charged Back Reason: Payment Stopped Disputed Checking Anthony 9 $250.00 $ mlr --csv put '$Amount = sub(string($Amount), "\$", "")' then stats1 -a sum -f Amount sample.csv Amount_sum 1497.870000 $ mlr --csv --ofmt '%.2lf' put '$Amount = sub(string($Amount), "\$", "")' then stats1 -a sum -f Amount sample.csv Amount_sum 1497.87 |