Depends on what you're trying to scrape. If it's in something like flash, then I'd say you're out of luck. Otherwise, I just use Python for any web scraping needs (actually very shocked no one mentioned it in this thread). As far as libraries go, just use BeautifulSoup and if you need to do authentication Requests. The time it takes to code these things up is pretty much nil. Here's a sample, this downloads NAVs for ETFs from bloomberg's website. I just use BeautifulSoup for the scraping and regex for date parsing. #/usr/bin/env python import BeautifulSoup import urllib import re from pandas import * def scrape_ETF_data(s): link = "http://www.bloomberg.com/quote/"+s+":US" soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(link)) table = soup.find("div", {"class" : "standard_stat"}).findChildren("tr") nav_date = re.search("20\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])", table[0].span$ data = {re.sub(r'[^\w]', '', x.th.string) : x.td.contents[-1].strip() for x in table} data["Symbol"] = s data["NAV_asof"] = nav_date return data
Just a FYI, if you do this your IP address will get a message saying to verify that you are a human then after your IP will get banned....