XML to flat file (CSV) using awk

NOTE. Since WordPress doesn’t allow proper XML to be shown on its pages, and I can’t even link to simple shell scripts, all sample files and command lines are shown as png images. Sorry about that.
END OF NOTE.

Recently, I wanted to test how large a very large XML file (180 MB in size) would be if it was formatted as a CSV (in this case a semi-colon separated file). To make it so, I wrote a little routine in AWK to read out a specified number of tags and their values, and print them in a CSV in the order they were specified.

A (sample) XML file (with more than a few tags removed) would be;

XML demo file

XML demo file

First, I needed to get hold of the tags of the XML file. All I’m really interested in, which’ll be evident later, is the value tags listed in the “record” element. To extract all of them was pretty easy.

Get XML tags

Get XML tags


imsi
msisdn
imei
pdp
apn
charging_id
duration
tx
rx
local
ggsn
sgsn
charge

I piped the output of the command to a file called elements.txt. Note the order in which the elements are listed. That’s the order in which they’ll appear on each output line later.

Now, here’s the code to run to get all of the XML formatted as CSV.

xmlToCSV.awk

The syntax to call the program is as follows, and the output should be piped to a different output file. Note the “STARTTAG” and “ENDTAG” parameters. They show between what tags data should be extracted. Note that this script can’t really handle XML that contains multiple tags with the same name, or multiple elements with the same name, for that matter. But, that shouldn’t be too hard to add either…


awk -f xmlToCSV.awk -v INPUTFILE=data/inputfile.xml -v STARTTAG=record -v ENDTAG=record -v ELEMFILE=elements.txt

Advertisements

About magvar

Technical guy, worked with computers since I was twelve, complete nerd, but managed to get married and am now the happy father of three kids.
This entry was posted in Uncategorized. Bookmark the permalink.

6 Responses to XML to flat file (CSV) using awk

  1. Jimmy says:

    I tried your script, but seems to hit an error?
    $ awk -f xmlToCSV.awk -v INPUTFILE=sample.xml -v STARTTAG=identifier -v ENDTAG=barring -v ELEMFILE=elements.txt
    awk: xmlToCSV.awk:112: }
    awk: xmlToCSV.awk:112: ^ syntax error

    can’t really figure out what is missing..I am using cygwin

    • magvar says:

      Jimmy, I’ll try the script on Cygwin tomorrow, see if there’s a difference when executing in that environment. Works fine on Ubuntu 14.04

  2. Jimmy says:

    I tried again, copying the code, it executes, but does not generate any meaningful output. Perhaps, my xml structure different from your given sample….

  3. Mriganka T says:

    Hello Magvar! Many thanks for this informative post. I am using this as a guide to help me get my xml converted to csv. I am still working with the tag names. My XML has multiple records, and when I extract the tags they just repeat themselves in a cyclic manner. Just wanted to check with you if you have a solution handy for multiple records in an XML.

    Thank you once again!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s