I was helping my wife out with a quick script to scrape some data from a site that had a bunch of tables in it. Having only done some regex based scraping before, tables provided a bit of a challenge…

Until I found BeautifulSoup. It’s a python library to which you can throw a blob of html, and it gives you a pretty handy way to traverse the hierarchy and pull a bunch of stuff out. Make sure you use it along with html5lib for better parsing.

Also, since when did this easy_install thing exist? Maybe I just haven’t been doing serious python lately, but this .egg stuff is pretty awesome. Deploying a collection of modules into a local site-packages directory was ridiculously easy. Are all the other python deployment problems solved now too? 

Leave a comment

Your email address will not be published. Required fields are marked *