joeware - never stop exploring... :)

Information about joeware mixed with wild and crazy opinions...

Parsing quoted CSV files with perl

by @ 6:38 pm on 4/2/2008. Filed under tech

This post brought to you by Colbie Caillat’s song Realize, Microsoft Windows Server 2008 Server Core,  and the letter Q.

 

I always end up rewriting code to parse CSV files in perl for some reason or another so I thought I would write it again and then post it here so when I don’t find it in my code snippet library the next time I can find it here… And also so you can use it too of course…

Background… perl has great text handling capability, but the method used to split up a line into tokens or in the case of CSV files into fields called the “split” function doesn’t take quotes into account which is a bit of a pain. This isn’t unusual, most default tokenizing functions do the same thing.

Perl does have an easy answer though… One of the modules that comes with the ActiveState dist[ribution] by default is called Text and it includes a routine for parsing lines with quotes that can be used…

The script supplied below will take a CSV file, read the header and then output the lines broken up in a readable format like so

Sample CSV

“dn”,”name”,”objectclass”,”description”,”gplink”
“OU=Domain Controllers,DC=test,DC=loc”,”Domain Controllers”,”top;organizationalUnit”,”Default container for domain controllers”,”[LDAP://CN={6AC1786C-016F-11D2-945F-00C04fB984F9},CN=Policies,CN=System,DC=test,DC=loc;0]”
“OU=Users,OU=My,DC=test,DC=loc”,”Users”,”top;organizationalUnit”,””,” “
“OU=My,DC=test,DC=loc”,”My”,”top;organizationalUnit”,””,””
“OU=Groups,OU=My,DC=test,DC=loc”,”Groups”,”top;organizationalUnit”,””,””

Sample output

G:\\Dev\\perl\\ParseCSV>outputcsv.pl oudmp.csv
description: "Default container for domain controllers"
dn: "OU=Domain Controllers,DC=test,DC=loc"
gplink: "[LDAP://CN={6AC1786C-016F-11D2-945F-00C04fB984F9},CN=Policies,CN=System,DC=test,DC=loc;0]"
name: "Domain Controllers"
objectclass: "top;organizationalUnit"

description: ""
dn: "OU=Users,OU=My,DC=test,DC=loc"
gplink: " "
name: "Users"
objectclass: "top;organizationalUnit"

description: ""
dn: "OU=My,DC=test,DC=loc"
gplink: ""
name: "My"
objectclass: "top;organizationalUnit"

description: ""
dn: "OU=Groups,OU=My,DC=test,DC=loc"
gplink: ""
name: "Groups"
objectclass: "top;organizationalUnit"

That makes it a bit easier to see what you have in front of you. 

Here is the script: 

use Text::ParseWords;

#
# Open CSV file
#
my $csvfilename=shift;
open fh,"<$csvfilename" or die("Couldn't open CSV file $csvfilename:$!\\n");


#
# Break up the header and map lowercase normalized header labels to field numbers
#
my $header=<fh>;
@headerfields=ParseCSVLine(lc($header));
my $cnt=0;
my %maphash=();
map {s/\\"//g;$maphash{lc($_)}=$cnt++} @headerfields;


# 
# Read file and output
#
while ($line=<fh>)
 {
  my @vals = ParseCSVLine($line);

  foreach $thisfield (sort keys %maphash) 
   {
    print "$thisfield: $vals[$maphash{$thisfield}]\\n";
   }
  print "\\n";
 }


sub ParseCSVLine
 {
  my $line=shift;
  my $delimiter=shift or ",";
  chomp $line;
  return &parse_line($delimiter, 1, $line);
 }

 

Within the script you can also do things like

print “The GPLinks for this entry are: $fields[$maphash{gplink}]\n”;

And regardless of where in the CSV file the gplink column is (i.e. you have different format CSV files) you will output the column that has the gplink info…

Looks like object type handling but within a completely text based passing mechanism so the info could have come through email or ftp or web page or wherever…

Could it be shorter? Of course, but the point here wasn’t making the smallest code. Just show something off and have it for later re-use.

 

   joe

Rating 3.00 out of 5

8 Responses to “Parsing quoted CSV files with perl”

  1. Brandon says:

    Joe… you know Powershell does this all for you, right?

    import-csv FileName.csv

    🙂

  2. Brandon says:

    oh… to get the format you want

    import-csv filename.csv | select description,dn,gplink,name,objectclass | fl

  3. Brandon says:

    p.s. I LOVE COLBIE!

  4. joe says:

    That command doesn’t seem to be working on my Windows Server 2008 Server Core machine… ;o)

    Ran it on an XP machine and it worked… but what happened to my quotes? I’ve lost information, specifically with the quotes I know that OU=Users,OU=My,DC=Test,DC=loc has an invalid value for gPLink.

    Colbie rocks.

  5. Brandon says:

    If you need the quotes. You can use calculated properties to add them back. Import-Csv strips them (it uses quotes to deal with values that may have “,” in them)

    import-csv joetest.csv | fl @{l=’description’;e={“`”{0}`”” -f $_.description}},
    @{l=’dn’;e={“`”{0}`”” -f $_.dn}},
    @{l=’gplink’;e={“`”{0}`”” -f $_.gplink}},
    @{l=’name’;e={“`”{0}`”” -f $_.name}},
    @{l=’objectclass’;e={“`”{0}`”” -f $_.objectclass}}

  6. joe says:

    I understand what the quotes are for… It either shouldn’t be stripping them or there should be a simple way of saying, yeah, don’t strip the quotes IMO. Flexibilty…

    For example if I wanted to strip the quotes in my script, I change the second param to parse_words from 1 to 0.

  7. Brandon says:

    I agree… I will tell the Product Group.

  8. Brandon says:

    Still easier than perl though 🙂

[joeware – never stop exploring… :) is proudly powered by WordPress.]