19

Is there an easier way to load an excel file directly into a Numpy array?

I have looked at the numpy.genfromtxt autoloading function from numpy documentation but it doesn't load excel files directly.

array = np.genfromtxt("Stats.xlsx")
ValueError: Some errors were detected !
Line #3 (got 2 columns instead of 1)
Line #5 (got 5 columns instead of 1)
......

Right now I am using using openpyxl.reader.excel to read the excel file and then append to numpy 2D arrays. This seems to be inefficient. Ideally I would like to have to excel file directly loaded to numpy 2D array.

2 Answers 2

18

Honestly, if you're working with heterogeneous data (as spreadsheets are likely to contain) using a pandas.DataFrame is a better choice than using numpy directly.

While pandas is in some sense just a wrapper around numpy, it handles heterogeneous data very very nicely. (As well as a ton of other things... For "spreadsheet-like" data, it's the gold standard in the python world.)

If you decide to go that route, just use pandas.read_excel.

Sign up to request clarification or add additional context in comments.

4 Comments

I would just add that to convert a data frame to a Numpy 2D array you can just use np.asarray(your_data_frame_here).
No, Sir. Pandas is not necessarily better. Pandas is incredibly slow--especially when loading even moderately large data files.
I agree, I really do not want to be forced into dealing with Pandas dataframes. It just unnecessary baggage and complication and one more unnecessary depency, for what should be simple numerical data. How about a simple direct solution without pandas please.
@AnthonyGatlin then what do you suggest as an alternative?
0

We can do it using xlrd library. We don't need to import entire pandas.

Here is below utility function, taken from Link

def read_excel(excel_path, sheet_no = 0):
    book = xlrd.open_workbook(excel_path)
    sheet = book.sheet_by_index(sheet_no)
    return numpy.array([list(map(lambda x : x.value, sheet.row(i))) for i in range(sheet.nrows)])

Hope this helps other who want to avoid pandas to read excel.

For me this alternative was 1 second slower than pandas.read_excel(...).to_numpy() for excel with 14k records

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.