Batteries included: Download, unzip and parse in 13 lines

The other day I needed to download some zip files, unpack them, parse the CSV files in them, and return the data as dicts. I did the very same thing a couple of years ago, and although the source is lost, I recall having a Python (2.4?) script of about two screens to do the download - so a hundred lines. When re-implementing the solution now that I know Python and the standard library better, I ended up with 12 lines written in just a few minutes - edited for blogging clarity it clocks in at 13 lines:

    import zipfile, urllib, csv
    def `get`_items(url):
      zip, headers = urllib.urlretrieve(url)
      with zipfile.ZipFile(zip) as zf:
        csvfiles = [name for name in zf.namelist() 
                     if name.endswith('.csv')]
        for filename in csvfiles:
          with as source:
            reader = csv.DictReader([line.decode('iso-8859-1') 
                                      for line in source])
            for item in reader:
              yield item

As trivial as it is, I think it is a nice example of just how much you can do with very little (coding) effort.

/Edit/: I created a gist with a cleaned up version using codecs.getreader. I'll be leaving this version as it is though.

comments powered by Disqus