Batteries included: Download, unzip and parse in 13 lines

Written by fredrik

31 mars, 2013

The other day I needed to download some zip files, unpack them, parse the CSV files in them, and return the data as dicts. I did the very same thing a couple of years ago, and although the source is lost, I recall having a Python (2.4?) script of about two screens to do the download – so a hundred lines. When re-implementing the solution now that I know Python and the standard library better, I ended up with 12 lines written in just a few minutes – edited for blogging clarity it clocks in at 13 lines:

import zipfile, urllib, csv
def `get`_items(url):
  zip, headers = urllib.urlretrieve(url)
  with zipfile.ZipFile(zip) as zf:
    csvfiles = [name for name in zf.namelist() 
                 if name.endswith('.csv')]
    for filename in csvfiles:
      with zf.open(filename) as source:
        reader = csv.DictReader([line.decode('iso-8859-1') 
                                  for line in source])
        for item in reader:
          yield item
  os.unlink(zip) 

As trivial as it is, I think it is a nice example of just how much you can do with very little (coding) effort.

Edit: I created a gist with a cleaned up version using codecs.getreader. I’ll be leaving this version as it is though.

You May Also Like…

0 kommentarer