0

I am currently trying to convert a JSON output from an API request to a CSV format so i can store the results into our database. Here is my current code for reference:

import pyodbc
import csv
#import urllib2
import json
import collections
import requests
#import pprint
#import functools

print ("Connecting via ODBC")

conn = pyodbc.connect('DSN=DSN', autocommit=True)

print ("Connected!\n")

cur = conn.cursor() 

sql = """SELECT DATA"""

cur.execute(sql)

#df = pandas.read_sql_query(sql, conn)

#df.to_csv('TEST.csv')

#print('CSV sheet is ready to go!')

rows = cur.fetchall()

obs_list = []

for row in rows:

    d = collections.OrderedDict()
    d['addressee'] = row.NAME
    d['street'] = row.ADDRESS
    d['city'] = row.CITY
    d['state'] = row.STATE
    d['zipcode'] = row.ZIP
    obs_list.append(d)

obs_file = 'TEST.json'
with open(obs_file, 'w') as file:
    json.dump(obs_list, file)


print('Run through API')


url = 'https://api.smartystreets.com/street-address?'

headers = {'content-type': 'application/json'}

with open('test1.json', 'r') as run:

    dict_run = run.readlines()

    dict_ready = (''.join(dict_run))

r = requests.post(url , data=dict_ready, headers=headers)

ss_output = r.text

output = 'output.json'

with open(output,'w') as of:

    json.dump(ss_output, of)

print('I think it works')

f = open('output.json')

   data = json.load(f)

data_1 = data['analysis']

data_2 = data['metadata']

data_3 = data['components']

entity_data = open('TEST.csv','w')

csvwriter = csv.writer(entity_data)

count = 0

count2 = 0

count3 = 0

for ent in data_1:

    if count == 0:

        header = ent.keys()

        csvwriter.writerow(header)

        count += 1

    csvwriter.writerow(ent.values())

for ent_2 in data_2:

    if count2 == 0:

        header2 = ent_2.keys()

        csvwriter.writerow(header2)

        count2 += 1

    csvwriter.writerow(ent_2.values())

for ent_3 in data_3:

    if count3 == 0:

        header3 = ent_3.keys()

        csvwriter.writerow(header3)

        count3 += 1

    csvwriter.writerow(ent_3.values())

entity_data.close()

Sample output from API:

[
    {
        "input_index": 0,
        "candidate_index": 0,
        "delivery_line_1": "1 Santa Claus Ln",
        "last_line": "North Pole AK 99705-9901",
        "delivery_point_barcode": "997059901010",
        "components": {
            "primary_number": "1",
            "street_name": "Santa Claus",
            "street_suffix": "Ln",
            "city_name": "North Pole",
            "state_abbreviation": "AK",
            "zipcode": "99705",
            "plus4_code": "9901",
            "delivery_point": "01",
            "delivery_point_check_digit": "0"
        },
        "metadata": {
            "record_type": "S",
            "zip_type": "Standard",
            "county_fips": "02090",
            "county_name": "Fairbanks North Star",
            "carrier_route": "C004",
            "congressional_district": "AL",
            "rdi": "Commercial",
            "elot_sequence": "0001",
            "elot_sort": "A",
            "latitude": 64.75233,
            "longitude": -147.35297,
            "precision": "Zip8",
            "time_zone": "Alaska",
            "utc_offset": -9,
            "dst": true
        },
        "analysis": {
            "dpv_match_code": "Y",
            "dpv_footnotes": "AABB",
            "dpv_cmra": "N",
            "dpv_vacant": "N",
            "active": "Y",
            "footnotes": "L#"
        }
    },

    {
        "input_index": 1,
        "candidate_index": 0,
        "delivery_line_1": "Loop land 1",
        "last_line": "North Pole AK 99705-9901",
        "delivery_point_barcode": "997059901010",
        "components": {
            "primary_number": "1",
            "street_name": "Lala land",
            "street_suffix": "Ln",
            "city_name": "North Pole",
            "state_abbreviation": "AK",
            "zipcode": "99705",
            "plus4_code": "9901",
            "delivery_point": "01",
            "delivery_point_check_digit": "0"
        },
        "metadata": {
            "record_type": "S",
            "zip_type": "Standard",
            "county_fips": "02090",
            "county_name": "Fairbanks North Star",
            "carrier_route": "C004",
            "congressional_district": "AL",
            "rdi": "Commercial",
            "elot_sequence": "0001",
            "elot_sort": "A",
            "latitude": 64.75233,
            "longitude": -147.35297,
            "precision": "Zip8",
            "time_zone": "Alaska",
            "utc_offset": -9,
            "dst": true
        },
        "analysis": {
            "dpv_match_code": "Y",
            "dpv_footnotes": "AABB",
            "dpv_cmra": "N",
            "dpv_vacant": "N",
            "active": "Y",
            "footnotes": "L#"
        }
]

After storing the API output the trouble is trying to parse the returned output (Sample output) into a CSV format. The code im using to try to do this:

f = open('output.json')

data = json.load(f)

data_1 = data['analysis']

data_2 = data['metadata']

data_3 = data['components']

entity_data = open('TEST.csv','w')

csvwriter = csv.writer(entity_data)

count = 0

count2 = 0

count3 = 0

for ent in data_1:

    if count == 0:

        header = ent.keys()

        csvwriter.writerow(header)

        count += 1

    csvwriter.writerow(ent.values())

for ent_2 in data_2:

    if count2 == 0:

        header2 = ent_2.keys()

        csvwriter.writerow(header2)

        count2 += 1

    csvwriter.writerow(ent_2.values())

for ent_3 in data_3:

    if count3 == 0:

        header3 = ent_3.keys()

        csvwriter.writerow(header3)

        count3 += 1

    csvwriter.writerow(ent_3.values())

entity_data.close()

returns the following error: TypeError: string indices must be integers. And as someone kindly commented and pointed out it appears i am iterating over keys instead of the different dictionaries, and this is where I get stuck because im not sure what to do? From my understanding it looks like the JSON is split into 3 different arrays with JSON object for each, but that does not appear to be the case according to the structure? I apologize for the length of the code, but I want some resemblance of context to what i am trying to accomplish.

5
  • What's the line you get that TypeError at? Could you refine and shrink your samples (strip out code and data that are not connected to the error) so it's easier to read? Commented Sep 13, 2016 at 21:34
  • When you write "for ent in data_1:" it sounds like you want to iterate through some dictionaries, but actually in this case you are iterating over dictionary keys. Then when you say ent.keys() ent is actually a string, so that wouldn't work. Commented Sep 13, 2016 at 21:40
  • Somehow the way you are asking this is making it difficult to answer, I would suggest to see if you can find the first point which you don't understand and submit that as a separate question. Commented Sep 13, 2016 at 21:41
  • Can you post at least two full entries for your JSON string? Because currently it's difficult to debug... Please also post a desired output (in CSV format) for those two rows Commented Sep 13, 2016 at 21:46
  • My apologies everyone thnak you for the comments, Ive been struggling with this all day so I guess my patience is a little low xD. Basically the desired CSV output i just want each key to be a header column and values to populate the column Commented Sep 13, 2016 at 22:27

2 Answers 2

2

Consider pandas's json_normalize() method to flatten nested items into tabular df structure:

import pandas as pd
from pandas.io.json import json_normalize
import json

with open('Output.json') as f:
    data = json.load(f)

df = json_normalize(data)

df.to_csv('Output.csv')

Do note the components, metadata, and analysis become period-separated prefixes to corresponding values. If not needed, consider renaming columns.

JSON to CSV Output

Sign up to request clarification or add additional context in comments.

Comments

0

You are saving request's result.text with json. result.text is a string so upon rereading it through json you get the same one long string instead of a list. Try to write result.text as is:

output = 'output.json'
with open(output,'w') as of:
    of.write(ss_output)

That's the cause of TypeError:string indices must be integers you mention. The rest of your code has multiple issues.

  1. The data in json is a list of dicts so to get ,say , data_1 you need list comprehension like this: data_1 = [x['analysis'] for x in data]

  2. You write three types of rows into the same csv file: components, metadata and analyzis. That's really odd.

Probably you have to rewrite the second half of the code: open three csv_writers one per data type, then iterate over data items and write their fields into corresponding csv_writer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.