Most efficient way to convert numpy array to string

Question

I'm running some simulations that were going too slow, so I profiled my code and found that over 90 percent of the time was being spent converting a (2D) numpy array to a string, as in:

arr = np.ones(25000).reshape(5000,5)
s = '\n'.join('\t'.join([str(x) for x in row]) for row in arr]

I tried a bunch of different solutions (using map, converting the array using astype(str), casting to a list) but most gave only marginal improvement.

Eventually I gave up on trying to convert the array to a string and saved it to a file on its own using np.save(arr), which gave a 2000x(!) speedup. Is there a way to write the array as a text file with similar performance?

Dr. Jan-Philip Gehrcke · Accepted Answer · 2015-02-10 18:35:53Z

3

Converting a numpy array to human-readable form should never determine the run time of your simulation. In fact, it shouldn't even contribute (significantly).

You should solve this problem on a different level. Ask yourself: how often do you really need to write the array to a file in human-readable form? Does it need to happen so often/regularly that it significantly determines the run time of your code? Is it sufficient to do it only once, when a certain result is there?

When you take this approach, you probably do not need to optimize your current writing method. I may want to give some numbers. Considering your simulation takes about one hour (without writing the result to disk). I think then you agree that it's fine if your code spends another 10 seconds with writing your result to disk, in human-readable form. And it really does not matter if this takes another 10 seconds, 1 second, or 100 seconds.

If for some reason you really need to regularly write your intermediate results to disk for later processing -- minimize the frequency, and use a binary data format.

edited Feb 10, 2015 at 18:35

answered Feb 10, 2015 at 18:30

Dr. Jan-Philip Gehrcke

36.4k14 gold badges90 silver badges133 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Michael K Over a year ago

Yep, that's what I ended up doing--each simulation only took about 1.5 milliseconds, and then the conversion to a string took about 500 milliseconds.

Dr. Jan-Philip Gehrcke Over a year ago

So -- is your problem solved? If it is not: how many of these short simulations do you need to perform? What is the output file for? For humans or for machines? How large are these output files? Is I/O a limiting factor?

Michael K Over a year ago

Yeah, the problem is solved. I was just wondering if there is actually a way to write the numpy array to a string on the same order of performance as np.save(fn, arr).

Michael K Over a year ago

And to answer the other questions; I need 1 million simulations; the output file is for machines (the output here is being read in by another analysis script). I was trying to write a string header with each file to make sure that I could check that the data and the parameters that generated the data didn't get separated. To fix it, I moved the header into its own file in the same folder as the output.

Dr. Jan-Philip Gehrcke Over a year ago

I see. Just that you know, professional HPC software uses the NetCDF file format or Hierarchical Data Format (HDF) for these kinds of things. Storing such data in ASCII format (or at least in human-readable form) requires CPU-costly conversion, requires much more space on disk, and slows processing down, significantly.

|

wormhole spacetime · Accepted Answer · 2015-02-10 18:31:35Z

2

Try using np.savetxt("file",arr). See the documentation here - (http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html).

answered Feb 10, 2015 at 18:31

wormhole spacetime

455 bronze badges

1 Comment

Andy Hayden Over a year ago

I don't quite get the downvotes here since OP doesn't mention they tried this. IMO you're never going to get as fast as savetxt (which is hand optimised for the job in C) in python, so this does answer the question. That said, Jan's answer is best - don't optimise this bit/work out a way to use binary data.

Collectives™ on Stack Overflow

Most efficient way to convert numpy array to string

2 Answers 2

6 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related