Estimated reading time: 3 minutes

Formatting CSV files in the shell

kemi asks in #bash:

Are there any tools to format CSV files in the console? Not interactive, no spreadsheets

column, which on Linux is likely to be provided by util-linux, does exactly this. The man page even offers an example of how it works(I’ve changed it slightly to make it work correctly on my system):

$ (echo "PERM LINKS OWNER GROUP SIZE DATE HH:MM NAME";
>     ls -l static/ | sed 1d) | column -t
PERM        LINKS  OWNER  GROUP  SIZE   DATE        HH:MM  NAME
-rw-r--r--  1      jay    jay    328    2009-09-25  02:59  body_background.png
-rw-r--r--  1      jay    jay    7234   2009-09-25  02:59  draft.png
-rw-r--r--  1      jay    jay    6253   2009-09-25  02:59  extract-metadata.xsl
-rw-r--r--  1      jay    jay    13143  2009-10-07  09:05  foaf.rdf
-rw-r--r--  1      jay    jay    3580   2009-09-25  02:59  gpg.asc
-rw-r--r--  1      jay    jay    7739   2009-09-25  02:59  icon.png
-rw-r--r--  1      jay    jay    96     2009-09-25  02:59  meta_background.png
-rw-r--r--  1      jay    jay    411    2009-09-29  11:39  remote.png

Given some other standard commands we can mangle that data in various ways at the shell prompt too, for example we may wish to display the total size of all the files. Or, as in the output below, the total size of all the PNG files in the directory.

$ (echo "PERM LINKS OWNER GROUP SIZE DATE HH:MM NAME"
>     ls -l static/ | sed 1d) | column -t | awk '{print}
>         /\.png/ {sum+=$5}
>         END {print "Total size of PNG files:", sum}'
PERM        LINKS  OWNER  GROUP  SIZE   DATE        HH:MM  NAME
-rw-r--r--  1      jay    jay    328    2009-09-25  02:59  body_background.png
-rw-r--r--  1      jay    jay    7234   2009-09-25  02:59  draft.png
-rw-r--r--  1      jay    jay    6253   2009-09-25  02:59  extract-metadata.xsl
-rw-r--r--  1      jay    jay    13143  2009-10-07  09:05  foaf.rdf
-rw-r--r--  1      jay    jay    3580   2009-09-25  02:59  gpg.asc
-rw-r--r--  1      jay    jay    7739   2009-09-25  02:59  icon.png
-rw-r--r--  1      jay    jay    96     2009-09-25  02:59  meta_background.png
-rw-r--r--  1      jay    jay    411    2009-09-29  11:39  remote.png
Total size of PNG files: 15808

In the original question on how to process CSV files all we really need to do is get the data in to a state for column to process, and we can use tr to do that. An example using a small chunk of the network log on this computer that we wish to format and calculate totals for could be:

$ cat stats
2009-07-15T05:09:42+0100,16803,4304661,129262665
2009-07-16T04:10:29+0100,17551,4012917,67572304
2009-07-16T19:03:00+0100,17621,1712073,34162500
2009-07-17T14:18:19+0100,7961,1071313,26286593
2009-07-17T18:23:40+0100,1867,308589,6057915
2009-07-17T18:53:21+0100,1740,180197,2907388
2009-07-17T19:00:03+0100,356,152917,928948
2009-07-17T22:01:57+0100,6611,1159789,25562873
2009-07-18T13:09:31+0100,1681,164663,2049315
2009-07-18T13:24:04+0100,834,54025,662134
$ (echo "Date,Duration,Sent,Received";  cat stats ) \
>     | tr ',' ' ' | column -t | awk '{print}
>         !/Date/ {sent+=$3; recv+=$4}
>         END {print "\nTotal sent: "sent", Total received: "recv}'
Date                      Duration  Sent     Received
2009-07-15T05:09:42+0100  16803     4304661  129262665
2009-07-16T04:10:29+0100  17551     4012917  67572304
2009-07-16T19:03:00+0100  17621     1712073  34162500
2009-07-17T14:18:19+0100  7961      1071313  26286593
2009-07-17T18:23:40+0100  1867      308589   6057915
2009-07-17T18:53:21+0100  1740      180197   2907388
2009-07-17T19:00:03+0100  356       152917   928948
2009-07-17T22:01:57+0100  6611      1159789  25562873
2009-07-18T13:09:31+0100  1681      164663   2049315
2009-07-18T13:24:04+0100  834       54025    662134

Total sent: 13121144, Total received: 295452635

Authenticate this page by pasting this signature into Keybase.

Have a suggestion or see a typo? Edit this page