Python: Store CSV data in list or array

Question

I have a csv file storing the number of times students have attempted each question, which has the format as below

UserID Q1 Q2 Q3 Q4
20     1  2  3  1
21     0  1  2  1

I am trying to write a python program to store the data into an array attempts_count.

attempts_count = numpy.zeros(shape=(2000,200,200))
with open('Question_Attempts_Worksheet_1.csv' , 'r') as csvfile:
        csvfile.readline()  # skip the first line(column title)
        for line in csvfile:
            csv_row = line.split()
            user_id = csv_row[0]
            for question_counter in range(0,4):
                attempts_count[user_id][1][question_counter] += csv_row[question_counter + 1]

I expect to obtain attempts_count[20][1][0]=1, attempts_count[20][1][2]=3, etc.

However, I got an error message saying

"IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices".

May I know how should I fix the problem?

Have you considered importing the csv module? It automatically parses your CSV file and allows you to access individual rows and columns without the need to manually parse rows and split by separators. — Terry
– Terry, Commented Mar 5, 2016 at 9:30

Ian · Accepted Answer · 2016-03-05 10:06:45Z

The best way to solve this issue is by csv packages, since the file is in csv format. This is how it can be done using csv packages:

attempts_count = numpy.zeros(shape=(2000,200,200))
with open ('Question_Attempts_Worksheet_1.csv' , 'r') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    next(reader, None)  # skip the headers
    for row in reader:
        for question_counter in range(0,4):
            attempts_count[int(row[0])][1][question_counter] += int(row[question_counter + 1])

However, to proceed from your code, there are at least three problems can be identified in the code.

The first problem is with your userId, since you got it from CSV file, it is a string rather than integer. Try to convert it to int before using it:

user_id = int(csv_row[0]) #here, get this as integer

The second problem is you don't seem to split your CSV row based on , separator (while a CSV file's row values are separated by comma). Thus, update also the string.split(',') using separator ,.

csv_row = line.split(',') # put , as separator here

And lastly, the third problem is similar to the first one. Since you want your csv_row[question_counter + 1] to be added to the attemps_count, it has to be converted to a number as well:

attempts_count[user_id][1][question_counter] += int(csv_row[question_counter + 1])

The complete code should look like this:

attempts_count = numpy.zeros(shape=(2000,200,200))
with open('Question_Attempts_Worksheet_1.csv' , 'r') as csvfile:
    csvfile.readline()  # skip the first line(column title)
    for line in csvfile:
        csv_row = line.split(',') # put , as separator here
        user_id = int(csv_row[0]) #here, get this as integer
        for question_counter in range(0,4):
            attempts_count[user_id][1][question_counter] += int(csv_row[question_counter + 1])

Collectives™ on Stack Overflow

Python: Store CSV data in list or array

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related