joeware - never stop exploring... :)

Information about joeware mixed with wild and crazy opinions...

Parsing CSVs with perl

by @ 8:13 pm on 10/20/2005. Filed under tech

Ever want to parse a CSV file with perl?

If you have used perl, the answer is probably yes. What happens if the delimiter you use, say the comma, is also used in one of the fields but the field is quoted so the comma shouldn’t be visible when you parse the string by comma…

The logical way to split up a string in this way would be to use split// except it isn’t smart enough to do it correct. For instance if you have something like


"CN=Builtin,DC=joe,DC=com","{CB61E1C9-C7FD-4ED7-848C-F28953526B7B}"
"OU=CleanOU,DC=joe,DC=com","{A2FF7DA6-7882-45D0-B24B-01C9F5B72C29}"
"CN=Computers,DC=joe,DC=com","{E11E1B12-7182-4330-B075-09B9DFC6E9F9}"
"OU=contacts,DC=joe,DC=com","{EC879C4D-1732-4818-99BE-7E895D05FD1E}"
"OU=Domain Controllers,DC=joe,DC=com","{6B9EC17A-A3B5-48EF-A541-38CC1422A851}"

The first line (stored in $thisline) if parsed with

my @fields=split/,/,$thisline;

would yield


$field[0]="CN=Builtin
$field[1]=DC=joe
$field[2]=DC=com"
$field[3]="{CB61E1C9-C7FD-4ED7-848C-F28953526B7B}"

when what we logically wanted was


$field[0]="CN=Builtin,DC=joe,DC=com"
$field[1]="{CB61E1C9-C7FD-4ED7-848C-F28953526B7B}"

The answer? It is built into the perl core and is called quotewords.

At the top of your script put in the line

use Text::ParseWords;

Then when you want to parse out a quoted CSV line you use the line

my @fields = quotewords(",", 0, $thisline);

Why did I have to figure this out? Well I am updating ADFIND to have a CSV output format and I needed to sort some of that csv output by pwdLastSet. Well the date format by default is mm/dd/yyyy-hh:mm:ss which doesn’t sort all that well in Excel. So I needed a quick perl script to rearrange the date format to yyyy/mm/dd-hh:mm:ss. So when writing it I ran into the whole split and quoted fields issue.

Below is the resulting quick and dirty script. Long term I am not sure what I will do but most likely I will add another switch to decode time/date stamps into a string sortable format.

You will note that I don’t assume the delimiter, I actually figure it out on the fly.

use Text::ParseWords;

$in=shift;
$out=shift;

open IFH,"< $in" or die("Error opening input file - $in : $!\n");
open OFH,">$out" or die("Error opening output file - $out : $!\n");

$datefilter='(\d\d)\/(\d\d)\/(\d\d\d\d)-(\d\d:\d\d:\d\d)';

$delimiter="";
foreach $thisline () 
 {
  chomp $thisline;
  if ($thisline=~/\"$datefilter\"/) 
   {
    if (!$delimiter) 
     {
      ($delimiter)=($thisline=~/\"(.)\"/);
      print "Delimiter is '$delimiter'\n";
     }
    my @fields = quotewords($delimiter, 0, $thisline);
    map {if (/$datefilter/) {$_="$3/$1/$2-$4";}} @fields;
    $thisline="";
    map{$thisline.="\"$_\"$delimiter"} @fields;
    $thisline=~s/$delimiter$//;
   }
  print OFH "$thisline\n";
 } 

Rating 3.00 out of 5

Comments are closed.

[joeware – never stop exploring… :) is proudly powered by WordPress.]