0

I am trying to scrape a date on a series of URLs that are in a csv and then output the dates to a new CSV.

I have the basic python code working but can't figure out how to load the CSV in (instead of pulling it from an array) and scrape each url and then output it to a new CSV. From reading a couple posts I think I would want to use the csv python module but can't get it working.

Here is my code for the scraping part

import urllib
import re

exampleurls =["http://www.domain1.com","http://www.domain2.com","http://www.domain3.com"]

i=0
while i<len(exampleurls):
    url = exampleurls[i]
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    regex = 'on [0-9][0-9]\.[0-9][0-9]\.[0-9][0-9]'
    pattern = re.compile(regex)
    date = re.findall(pattern,htmltext)
    print date
    i+=1

Any help is much appreciated!

3
  • Ok, but you need to import the module. import csv Then you can try writing some code and post it here. Commented Jan 6, 2014 at 2:29
  • Yeah I got that part, sorry if that wasn't clear. I didn't include it because I didn't include the csv code. Commented Jan 6, 2014 at 2:56
  • show us your csv reader / writer attempt Commented Jan 6, 2014 at 3:20

1 Answer 1

1

If your csv looks like this:

"http://www.domain1.com","other column","yet another"
"http://www.domain2.com","other column","yet another"
...

Extract domains like this:

import urllib
import csv

with open('urlFile.csv') as f:
    reader = csv.reader(f)

    for rec in reader:
        htmlfile = urllib.urlopen(rec[0])
        ...

And if your url file just looks like this:

http://www.domain1.com
http://www.domain2.com
...

You could do something even cooler with list comprehensions like this:

urls = [x for x in open('urlFile')]

EDIT: reply to comment

You can either open a file in python like:

f = open('myurls.csv', 'w')
...
for rec in reader:
    ...
    f.write(urlstring)
f.close()

Or if you're on unix/linux just use print inside your code, then in bash:

python your_scraping_script.py > someoutfile.csv
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot fivetentaylor! That worked! Do you know how I would then save the dates to a CSV file instead of printing them?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.