0

I'm trying to get the data that get loaded into the chart of this page when hitting the max (time range) button. The data are loaded with an ajax request.

I inspected the request and tried to reproduce it with the requests python library but I'm only able to retrieve the 1-year data from this chart.

Here is the code I used:

r = requests.get("https://www.justetf.com/en/etf-profile.html?0-4.0-tabs-panel-chart-dates-ptl_max&groupField=none&sortField=ter&sortOrder=asc&from=search&isin=IE00B3VWN518&tab=chart&_=1576272593482")
r.content

I also tried to use Session:

from requests import Session
session = Session()

session.head('http://justetf.com')

response = session.get(
    url='https://www.justetf.com/en/etf-profile.html?0-4.0-tabs-panel-chart-dates-ptl_max&groupField=none&sortField=ter&sortOrder=asc&from=search&isin=IE00B3VWN518&tab=chart&_=1575929227619',
    data = {"0-4.0-tabs-panel-chart-dates-ptl_max":"",
            "groupField":"none","sortField":"ter",
            "sortOrder":"asc","from":"search",
            "isin":"IE00B3VWN518",
            "tab":"chart",
            "_":"1575929227619"
           },

    headers={
        'Host': 'www.justetf.com',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
        'Accept': 'application/xml, text/xml, */*; q=0.01',
        'Accept-Language': 'en-US,en;q=0.5',
        'Accept-Encoding': 'gzip, deflate, br',
        'Wicket-Ajax': 'true',
        'Wicket-Ajax-BaseURL': 'en/etf-profile.html?0&groupField=none&sortField=ter&sortOrder=asc&from=search&isin=IE00B3VWN518&tab=chart',
        'Wicket-FocusedElementId': 'id28',
        'X-Requested-With': 'XMLHttpRequest',
        'Connection': 'keep-alive',
        'Referer': 'https://www.justetf.com/en/etf-profile.html?groupField=none&sortField=ter&sortOrder=asc&from=search&isin=IE00B3VWN518&tab=chart',
        'Cookie': 'locale_=en; _ga=GA1.2.1297456970.1574289342; cookieconsent_status=dismiss; AWSALB=QMWHJxgfcpLXJLqX0i0FgBuLn+mpVHVeLRQ6upH338LdggA4/thXHT2vVWQX7pdBd1r486usZXgpAF8RpDsGJNtf6ei8e5NHTsg0hzVHR9C+Fj89AWuQ7ue+fzV2; JSESSIONID=ABB2A35B91751CA9B2D293F5A04505BE; _gid=GA1.2.1029531470.1575928527; _gat=1',
        'TE': 'Trailer'


        },

    cookies = {"_ga":"GA1.2.1297456970.1574289342","_gid":"GA1.2.1411779365.1574289342","AWSALB":"5v+tPMgooQC0deJBlEGl2wVeUSmwVGJdydie1D6dAZSRAK5eBsmg+DQCdBj8t25YRytC5NIi0TbU3PmDcNMjiyFPTp1xKHgwNjZcDvMRePZjTxthds5DsvelzE2I","JSESSIONID":"310F346AED94D1A345207A3489DCF83D","locale_":"en"}
)

but I get this response

<ajax-response><redirect><![CDATA[/en/etf-profile.html?0&groupField=none&sortField=ter&sortOrder=asc&from=search&isin=IE00B3VWN518&tab=chart]]></redirect></ajax-response>

Why am I not getting a response to the same XML file that I get on my browser when I hit MAX?

6
  • Are you trying to get this response? <?xml version="1.0" encoding="UTF-8"?><ajax-response></ajax-response> because that is why I get. Commented Dec 14, 2019 at 2:53
  • @foba the OP trying to get the XML response of the webpage .i.ibb.co/JzCHfX0/Capture.png Commented Dec 14, 2019 at 2:54
  • Where are you seeing that? Commented Dec 14, 2019 at 2:59
  • Don't use cookies, headers and data in your request. like this session.get(url) and you will get proper html response. Commented Dec 14, 2019 at 13:28
  • @foba, as αԋɱҽԃ αмєяιcαη said I'm trying to get the xml response that is obtained by clicking max Commented Dec 15, 2019 at 9:39

1 Answer 1

1

Okay below is my solution to obtaining the data you seek:

url = "https://www.justetf.com/en/etf-profile.html"

querystring = {
  # Modify this string to get the timeline you want
  # Currently it is set to "max" as you can see
  "0-1.0-tabs-panel-chart-dates-ptl_max":"",
  "groupField":"none",
  "sortField":"ter",
  "sortOrder":"asc",
  "from":"search",
  "isin":"IE00B3VWN518",
  "tab":"chart",
  "_":"1576627890798"}

# Not all of these headers may be necessary
headers = {
    'authority': "www.justetf.com",
    'accept': "application/xml, text/xml, */*; q=0.01",
    'x-requested-with': "XMLHttpRequest",
    'wicket-ajax-baseurl': "en/etf-profile.html?0&amp;groupField=none&amp;sortField=ter&amp;sortOrder=asc&amp;from=search&amp;isin=IE00B3VWN518&amp;tab=chart",
    'wicket-ajax': "true",
    'wicket-focusedelementid': "id27",
    'Connection': "keep-alive",
}

session = requests.Session()

# The first request won't return what we want but it sets the cookies
response = session.get( url, params=querystring)

# Cookies have been set now we can make the 2nd request and get the data we want
response = session.get( url, headers=headers, params=querystring)

print(response.text)

As a bonus, I have included a link to a repl.it where I actually parse the data and get each individual data point. You can find this here.

Let me know if that helps!

Sign up to request clarification or add additional context in comments.

1 Comment

Though I am a bit confused by the fact that you have to set the cookies with the first session.get() and then get the response with the second one. Shall I do this always for ajax requests? Could you point me to a post or article where this is explained? I didn't find good resources about scraping, do you have any recommendation?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.