0

I am trying to scrape some tables from a website. The url has two parameters that keeps changing with every table - id value and an alpha value. The example of the url is as follows:

http://resources.afaqs.com/index.html?id=123&category=AD+Agencies&alpha=A

I want to iterate through id and alpha value. My code so far is as follows:

import csv
import bs4 as bs
import requests


data = ['1','2','3','7','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','W','X','Y','Z']
number = None


while number < 500:
    for i in data:
        url = "http://resources.afaqs.com/index.html?id="
        if number is not None:
            url += str(number) + "&category=AD+Agencies&alpha={}".format(i)
        print(url)

        if number is None:
            number = 1
        else:
            number += 1

This iterates the id number from 1 to 499 and for the alpha value A to Z sequentially. Whereas what I want is: for every id, I want the alpha values to be iterated from A to Z.

I tried changing the for loop by using it before while loop, for loop before print url, etc...each of these combinations gives odd results and not the one that I wanted.

Can someone help please?

6
  • 1
    Why are you setting number to None and then doing a check, why not just set number to 0 and then you can do a simple if not number check and do your number += 1 without any extra if statements? Commented Jun 14, 2017 at 12:29
  • I don't understand why your code is not what you want. Commented Jun 14, 2017 at 12:31
  • Why do you have 1, 2, 3, 7 in your data list if you only want A to Z? Commented Jun 14, 2017 at 12:32
  • number < 500 where number is None object <- it will raise TypeError in Python 3, so it looks like you are using 2.* Commented Jun 14, 2017 at 12:32
  • The alpha values does have 1,2,3,7 Commented Jun 14, 2017 at 12:33

2 Answers 2

3

Don't use the while loop at all, use nested for:

url = "http://resources.afaqs.com/index.html?id={}&category=AD+Agencies&alpha={}"
for number in range(1,500):
    for i in data:
        print url.format(number, i)           
Sign up to request clarification or add additional context in comments.

2 Comments

Oh, and if you use only one-character entries in data, you might change it like this: data = '1237ABCDEFGHIJKLMNOPQRSTUVWXYZ'
This feels much more Pythonic, I like
2

assuming we need to iterate through ids and for each id iterate through uppercase latin letters we can write

from string import ascii_uppercase


def get_urls(number_stop):
    url = "http://resources.afaqs.com/index.html?id={}&category=AD+Agencies&alpha={}"
    urls = []
    for number in range(1, number_stop):
        for letter in ascii_uppercase:
            urls.append(url.format(number, letter))
    return urls

or using generator

from string import ascii_uppercase


def generate_urls(number_stop):
    url = "http://resources.afaqs.com/index.html?id={}&category=AD+Agencies&alpha={}"
    for number in range(1, number_stop):
        for letter in ascii_uppercase:
            yield url.format(number, letter)

or finally using generator & product to get rid of extra loop

from itertools import product
from string import ascii_uppercase


def generate_urls(number_stop):
    url = "http://resources.afaqs.com/index.html?id={}&category=AD+Agencies&alpha={}"
    for number, letter in product(range(1, number_stop),
                                  ascii_uppercase):
        yield url.format(number, letter)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.