These are simple n-grams as described
here. Some common functions are
here.
Then here are scripts for
1-grams,
2-grams,
3-grams,
4-grams, and
5-grams.
The idea is that words from the input file are consumed, then taken apart and pasted back together
in ways which imitate the letter-to-letter transitions found in the word list — giving us automatically
generated words in the same vein as bromance and spork:
$ mlr --nidx --from ./ngrams/gsl-2000.txt put -q -f ./ngrams/ngfuncs.mlr -f ./ngrams/ng5.mlr
beard
plastinguish
politicially
noise
loan
country
controductionary
suppery
lose
lessors
dollar
judge
rottendence
lessenger
diffendant
suggestional
Program timing
This admittedly artificial example demonstrates using Miller time and stats
functions to introspectively acquire some information about Miller’s own
runtime. The delta function computes the difference between successive
timestamps.
POKI_INCLUDE_ESCAPED(data/timing-example.txt)HERE
Computing interquartile ranges
For one or more specified field names, simply compute p25 and p75, then write the IQR as the difference of p75 and p25:
POKI_INCLUDE_AND_RUN_ESCAPED(data/iqr1.sh)HERE
For wildcarded field names, first compute p25 and p75, then loop over field names with p25 in them:
POKI_INCLUDE_AND_RUN_ESCAPED(data/iqrn.sh)HERE
Computing weighted means
This might be more elegantly implemented as an option within the stats1 verb. Meanwhile, it’s
expressible within the DSL:
POKI_INCLUDE_AND_RUN_ESCAPED(data/weighted-mean.sh)HERE
Generating random numbers from various distributions
Here we can chain together a few simple building blocks:
POKI_RUN_COMMAND{{cat expo-sample.sh}}HERE
Namely:
Set the Miller random-number seed so this webdoc looks the same every time I regenerate it.
Use pretty-printed tabular output.
Use pretty-printed tabular output.
Use seqgen to produce 100,000 records i=0, i=1, etc.
Send those to a put step which defines an inverse-transform-sampling function and
calls it twice, then computes the sum and product of samples.
Send those to a histogram, and from there to a bar-plotter. This is just for visualization; you
could just as well output CSV and send that off to your own plotting tool, etc.
The output is as follows:
POKI_RUN_COMMAND{{sh expo-sample.sh}}HERE
Sieve of Eratosthenes
The Sieve_of_Eratosthenes
is a standard introductory programming topic. The idea is to find all primes up
to some N by making a list of the numbers 1 to N, then striking
out all multiples of 2 except 2 itself, all multiples of 3 except 3 itself, all
multiples of 4 except 4 itself, and so on. Whatever survives that without
getting marked is a prime. This is easy enough in Miller. Notice that here all
the work is in begin and end statements; there is no file
input (so we use mlr -n to keep Miller from waiting for input data).
POKI_RUN_COMMAND{{cat programs/sieve.mlr}}HERE
POKI_RUN_COMMAND{{mlr -n put -f programs/sieve.mlr}}HERE
Mandelbrot-set generator
The Mandelbrot
set is also easily expressed. This isn’t an important case of
data-processing in the vein for which Miller was designed, but it is an
example of Miller as a general-purpose programming language — a test
case for the expressiveness of the language.
The (approximate) computation of points in the complex plane which are and
aren’t members is just a few lines of complex arithmetic (see the
Wikipedia article); how to render them is another task. Using graphics
libraries you can create PNG or JPEG files, but another fun way to do this is
by printing various characters to the screen:
POKI_RUN_COMMAND{{cat programs/mand.mlr}}HERE
At standard resolution this makes a nice little ASCII plot:
POKI_RUN_COMMAND{{mlr -n put -f ./programs/mand.mlr}}HERE
But using a very small font size (as small as my Mac will let me go), and by choosing the
coordinates to zoom in on a particular part of the complex plane, we can get a nice little
picture:
#!/bin/bash
# Get the number of rows and columns from the terminal window dimensions
iheight=$(stty size | mlr --nidx --fs space cut -f 1)
iwidth=$(stty size | mlr --nidx --fs space cut -f 2)
echo "rcorn=-1.755350,icorn=+0.014230,side=0.000020,maxits=10000,iheight=$iheight,iwidth=$iwidth" \
| mlr put -f programs/mand.mlr