I have the following huge input file (from stackexchange dataset):
<row Id="659890" PostTypeId="2" ParentId="655986" CreationDate="2009-03-18T20:06:33.720" />
<row Id="659891" PostTypeId="2" ParentId="659089" CreationDate="2009-03-18T20:07:44.843" />
Usually, the way I process a file is by reading line by line:
f = open( "file.txt", "r" )
for line in f:
print line
However, for this case I would like to process it post by post. How can I do this?
Moreover, I want to be able to extract the value of PostTypeId and save it in a variable (I want to do the same for the other values as well).
So my question is: What is the most efficient way to do this assuming that the dataset can be really huge?