Parse JSON output from API file into CSV

Question

I am currently trying to convert a JSON output from an API request to a CSV format so i can store the results into our database. Here is my current code for reference:

import pyodbc
import csv
#import urllib2
import json
import collections
import requests
#import pprint
#import functools

print ("Connecting via ODBC")

conn = pyodbc.connect('DSN=DSN', autocommit=True)

print ("Connected!\n")

cur = conn.cursor() 

sql = """SELECT DATA"""

cur.execute(sql)

#df = pandas.read_sql_query(sql, conn)

#df.to_csv('TEST.csv')

#print('CSV sheet is ready to go!')

rows = cur.fetchall()

obs_list = []

for row in rows:

    d = collections.OrderedDict()
    d['addressee'] = row.NAME
    d['street'] = row.ADDRESS
    d['city'] = row.CITY
    d['state'] = row.STATE
    d['zipcode'] = row.ZIP
    obs_list.append(d)

obs_file = 'TEST.json'
with open(obs_file, 'w') as file:
    json.dump(obs_list, file)


print('Run through API')


url = 'https://api.smartystreets.com/street-address?'

headers = {'content-type': 'application/json'}

with open('test1.json', 'r') as run:

    dict_run = run.readlines()

    dict_ready = (''.join(dict_run))

r = requests.post(url , data=dict_ready, headers=headers)

ss_output = r.text

output = 'output.json'

with open(output,'w') as of:

    json.dump(ss_output, of)

print('I think it works')

f = open('output.json')

   data = json.load(f)

data_1 = data['analysis']

data_2 = data['metadata']

data_3 = data['components']

entity_data = open('TEST.csv','w')

csvwriter = csv.writer(entity_data)

count = 0

count2 = 0

count3 = 0

for ent in data_1:

    if count == 0:

        header = ent.keys()

        csvwriter.writerow(header)

        count += 1

    csvwriter.writerow(ent.values())

for ent_2 in data_2:

    if count2 == 0:

        header2 = ent_2.keys()

        csvwriter.writerow(header2)

        count2 += 1

    csvwriter.writerow(ent_2.values())

for ent_3 in data_3:

    if count3 == 0:

        header3 = ent_3.keys()

        csvwriter.writerow(header3)

        count3 += 1

    csvwriter.writerow(ent_3.values())

entity_data.close()

Sample output from API:

[
    {
        "input_index": 0,
        "candidate_index": 0,
        "delivery_line_1": "1 Santa Claus Ln",
        "last_line": "North Pole AK 99705-9901",
        "delivery_point_barcode": "997059901010",
        "components": {
            "primary_number": "1",
            "street_name": "Santa Claus",
            "street_suffix": "Ln",
            "city_name": "North Pole",
            "state_abbreviation": "AK",
            "zipcode": "99705",
            "plus4_code": "9901",
            "delivery_point": "01",
            "delivery_point_check_digit": "0"
        },
        "metadata": {
            "record_type": "S",
            "zip_type": "Standard",
            "county_fips": "02090",
            "county_name": "Fairbanks North Star",
            "carrier_route": "C004",
            "congressional_district": "AL",
            "rdi": "Commercial",
            "elot_sequence": "0001",
            "elot_sort": "A",
            "latitude": 64.75233,
            "longitude": -147.35297,
            "precision": "Zip8",
            "time_zone": "Alaska",
            "utc_offset": -9,
            "dst": true
        },
        "analysis": {
            "dpv_match_code": "Y",
            "dpv_footnotes": "AABB",
            "dpv_cmra": "N",
            "dpv_vacant": "N",
            "active": "Y",
            "footnotes": "L#"
        }
    },

    {
        "input_index": 1,
        "candidate_index": 0,
        "delivery_line_1": "Loop land 1",
        "last_line": "North Pole AK 99705-9901",
        "delivery_point_barcode": "997059901010",
        "components": {
            "primary_number": "1",
            "street_name": "Lala land",
            "street_suffix": "Ln",
            "city_name": "North Pole",
            "state_abbreviation": "AK",
            "zipcode": "99705",
            "plus4_code": "9901",
            "delivery_point": "01",
            "delivery_point_check_digit": "0"
        },
        "metadata": {
            "record_type": "S",
            "zip_type": "Standard",
            "county_fips": "02090",
            "county_name": "Fairbanks North Star",
            "carrier_route": "C004",
            "congressional_district": "AL",
            "rdi": "Commercial",
            "elot_sequence": "0001",
            "elot_sort": "A",
            "latitude": 64.75233,
            "longitude": -147.35297,
            "precision": "Zip8",
            "time_zone": "Alaska",
            "utc_offset": -9,
            "dst": true
        },
        "analysis": {
            "dpv_match_code": "Y",
            "dpv_footnotes": "AABB",
            "dpv_cmra": "N",
            "dpv_vacant": "N",
            "active": "Y",
            "footnotes": "L#"
        }
]

After storing the API output the trouble is trying to parse the returned output (Sample output) into a CSV format. The code im using to try to do this:

f = open('output.json')

data = json.load(f)

data_1 = data['analysis']

data_2 = data['metadata']

data_3 = data['components']

entity_data = open('TEST.csv','w')

csvwriter = csv.writer(entity_data)

count = 0

count2 = 0

count3 = 0

for ent in data_1:

    if count == 0:

        header = ent.keys()

        csvwriter.writerow(header)

        count += 1

    csvwriter.writerow(ent.values())

for ent_2 in data_2:

    if count2 == 0:

        header2 = ent_2.keys()

        csvwriter.writerow(header2)

        count2 += 1

    csvwriter.writerow(ent_2.values())

for ent_3 in data_3:

    if count3 == 0:

        header3 = ent_3.keys()

        csvwriter.writerow(header3)

        count3 += 1

    csvwriter.writerow(ent_3.values())

entity_data.close()

returns the following error: TypeError: string indices must be integers. And as someone kindly commented and pointed out it appears i am iterating over keys instead of the different dictionaries, and this is where I get stuck because im not sure what to do? From my understanding it looks like the JSON is split into 3 different arrays with JSON object for each, but that does not appear to be the case according to the structure? I apologize for the length of the code, but I want some resemblance of context to what i am trying to accomplish.

What's the line you get that TypeError at? Could you refine and shrink your samples (strip out code and data that are not connected to the error) so it's easier to read? — Pavel Gurkov
– Pavel Gurkov, Commented Sep 13, 2016 at 21:34
When you write "for ent in data_1:" it sounds like you want to iterate through some dictionaries, but actually in this case you are iterating over dictionary keys. Then when you say ent.keys() ent is actually a string, so that wouldn't work. — Bemmu
– Bemmu, Commented Sep 13, 2016 at 21:40
Somehow the way you are asking this is making it difficult to answer, I would suggest to see if you can find the first point which you don't understand and submit that as a separate question. — Bemmu
– Bemmu, Commented Sep 13, 2016 at 21:41
Can you post at least two full entries for your JSON string? Because currently it's difficult to debug... Please also post a desired output (in CSV format) for those two rows — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Sep 13, 2016 at 21:46
My apologies everyone thnak you for the comments, Ive been struggling with this all day so I guess my patience is a little low xD. Basically the desired CSV output i just want each key to be a header column and values to populate the column — Ninjaboy12
– Ninjaboy12, Commented Sep 13, 2016 at 22:27

Parfait · Accepted Answer · 2016-09-14 01:59:40Z

2

Consider pandas's json_normalize() method to flatten nested items into tabular df structure:

import pandas as pd
from pandas.io.json import json_normalize
import json

with open('Output.json') as f:
    data = json.load(f)

df = json_normalize(data)

df.to_csv('Output.csv')

Do note the components, metadata, and analysis become period-separated prefixes to corresponding values. If not needed, consider renaming columns.

answered Sep 14, 2016 at 1:59

Parfait

108k19 gold badges103 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

robyschek · Accepted Answer · 2016-09-13 23:15:11Z

You are saving request's result.text with json. result.text is a string so upon rereading it through json you get the same one long string instead of a list. Try to write result.text as is:

output = 'output.json'
with open(output,'w') as of:
    of.write(ss_output)

That's the cause of TypeError:string indices must be integers you mention. The rest of your code has multiple issues.

The data in json is a list of dicts so to get ,say , data_1 you need list comprehension like this: data_1 = [x['analysis'] for x in data]
You write three types of rows into the same csv file: components, metadata and analyzis. That's really odd.

Probably you have to rewrite the second half of the code: open three csv_writers one per data type, then iterate over data items and write their fields into corresponding csv_writer.

Collectives™ on Stack Overflow

Parse JSON output from API file into CSV

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related