I am new to python and having trouble with something that seems conceptually very simple. I've read a number of SO posts but still can't solve my problem(s).
I have a function to convert amazon reviews to json format. Each review becomes a single json object. I would like to compile all reviews in a single dataframe, with the json keys as columns and each review in a row.
There are a large number of reviews, each formatted like so:
{
"product/productId": "B00006HAXW",
"product/title": "Winnie the Pooh",
"product/price": "unknown",
"review/userId": "A1RSDE90N6RSZF",
"review/profileName": "piglet",
"review/helpfulness": "9/9",
"review/score": "5.0",
"review/time": "1042502400",
"review/summary": "Love this book",
"review/text" : "Exciting stories about highly intelligent creatures, very inspiring!"
}
How can I compile all reviews into a pandas dataframe? I'm having two separate problems:
How do I compile all reviews in one object? Currently, the output is generated like so:
for e in parse("reviews.txt.gz"): print json.dumps(e)
I tried creating an empty list and using append:
for e in parse("reviews.txt.gz"):
revs = []
revs = revs.append(json.dumps(e))
but that does not work - print revs prints out
None
None
None
- When I use
pd.read_jsonon a single review formatted as above, it returns "If using all scalar values, you must must pass an index". Does this mean I do not have valid json format data?
revs = []for every loop, then re-assigningrevsto the output of alist.appendcall (which isNone;list.appendmodifies the originallist). Additionally, you likely don't need thejson.dumps(e)call, you want a list of python objects not json objects.parse("reviews.txt.gz")working? is that what produces the example json you posted?parse("file").