how to do web scrapping

stock777 · Jul 10, 2013

ive actually used autoit to do some pretty heavy lifting

slickpick · Aug 7, 2013

Depends on what you're trying to scrape. If it's in something like flash, then I'd say you're out of luck.

Otherwise, I just use Python for any web scraping needs (actually very shocked no one mentioned it in this thread). As far as libraries go, just use BeautifulSoup and if you need to do authentication Requests. The time it takes to code these things up is pretty much nil.

Here's a sample, this downloads NAVs for ETFs from bloomberg's website. I just use BeautifulSoup for the scraping and regex for date parsing.

#/usr/bin/env python
import BeautifulSoup
import urllib
import re

from pandas import *

def scrape_ETF_data(s):
link = "http://www.bloomberg.com/quote/"+s+":US"
soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(link))
table = soup.find("div", {"class" : "standard_stat"}).findChildren("tr")
nav_date = re.search("20\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])", table[0].span$
data = {re.sub(r'[^\w]', '', x.th.string) : x.td.contents[-1].strip() for x in table}
data["Symbol"] = s
data["NAV_asof"] = nav_date
return data

Satan's Helper · Aug 7, 2013

Just a FYI, if you do this your IP address will get a message saying to verify that you are a human then after your IP will get banned....

Log in or Sign up

how to do web scrapping

stock777

slickpick

Satan's Helper

Resources

Members