POKI_PUT_TOC_HERE

flins data

The flins.csv file is some sample data obtained from https://support.spatialkey.com/spatialkey-sample-csv-data.

Note: please use "mlr --csv --rs lf" for for native Un*x (linefeed-terminated) CSV files.

Vertical-tabular format is good for a quick look at CSV data layout — seeing what columns you have to work with: POKI_RUN_COMMAND{{head -n 2 data/flins.csv | mlr --icsv --oxtab cat}}HERE

A few simple queries: POKI_RUN_COMMAND{{cat data/flins.csv | mlr --icsv --opprint count-distinct -f county | head}}HERE POKI_RUN_COMMAND{{cat data/flins.csv | mlr --icsv --opprint count-distinct -f construction,line}}HERE

Categorization of total insured value: POKI_RUN_COMMAND{{cat data/flins.csv | mlr --icsv --opprint stats1 -a min,mean,max -f tiv_2012}}HERE POKI_RUN_COMMAND{{cat data/flins.csv | mlr --icsv --opprint stats1 -a min,mean,max -f tiv_2012 -g construction,line}}HERE POKI_RUN_COMMAND{{cat data/flins.csv | mlr --icsv --oxtab stats1 -a p0,p10,p50,p90,p95,p99,p100 -f hu_site_deductible}}HERE POKI_RUN_COMMAND{{cat data/flins.csv | mlr --icsv --opprint stats1 -a p95,p99,p100 -f hu_site_deductible -g county then sort -f county | head}}HERE POKI_RUN_COMMAND{{cat data/flins.csv | mlr --icsv --oxtab stats2 -a corr,linreg-ols,r2 -f tiv_2011,tiv_2012}}HERE POKI_RUN_COMMAND{{cat data/flins.csv | mlr --icsv --opprint stats2 -a corr,linreg-ols,r2 -f tiv_2011,tiv_2012 -g county}}HERE

Color/shape data

The colored-shapes.dkvp file is some sample data produced by the mkdat2 script. The idea is

Peek at the data: POKI_RUN_COMMAND{{wc -l data/colored-shapes.dkvp}}HERE POKI_RUN_COMMAND{{head -n 6 data/colored-shapes.dkvp | mlr --opprint cat}}HERE

Look at uncategorized stats (using creach for spacing). Here it looks reasonable that u is unit-uniform; something’s up with v but we can't yet see what: POKI_RUN_COMMAND{{mlr --oxtab stats1 -a min,mean,max -f flag,u,v data/colored-shapes.dkvp | creach 3}}HERE

The histogram shows the different distribution of 0/1 flags: POKI_RUN_COMMAND{{mlr --opprint histogram -f flag,u,v --lo -0.1 --hi 1.1 --nbins 12 data/colored-shapes.dkvp}}HERE

Look at univariate stats by color and shape. In particular, color-dependent flag probabilities pop out, aligning with their original Bernoulli probablities from the data-generator script: POKI_RUN_COMMAND{{mlr --opprint stats1 -a min,mean,max -f flag,u,v -g color then sort -f color data/colored-shapes.dkvp}}HERE POKI_RUN_COMMAND{{mlr --opprint stats1 -a min,mean,max -f flag,u,v -g shape then sort -f shape data/colored-shapes.dkvp}}HERE

Look at bivariate stats by color and shape. In particular, u,v pairwise correlation for red circles pops out: POKI_RUN_COMMAND{{mlr --opprint --right stats2 -a corr -f u,v,w,x data/colored-shapes.dkvp}}HERE POKI_RUN_COMMAND{{mlr --opprint --right stats2 -a corr -f u,v,w,x -g color,shape then sort -nr u_v_corr data/colored-shapes.dkvp}}HERE