I understand that serialization of data means converting a data structure or object state to a form which can be stored in a file or buffer, can be transmitted, and can be reconstructed later (https://www.tutorialspoint.com/object_oriented_python/object_oriented_python_serialization.htm). Based on this definition, converting a numpy array to .npy format should be serialization of the numpy array data object. However, I could not find this assertion anywhere, when I looked up on the internet. Most of the related links were mentioning about how pickle format does serialization of data in python. My questions is - is converting numpy array to .npz format an example of serialization of a python data object. If not, what are the reasons?
2 Answers
Well, according to Wikipedia:
In computing, serialization (or serialisation) is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted (for example, across a network connection link) and reconstructed later (possibly in a different computer environment).
And according to Numpy Doc:
Binary serialization
NPY format A simple format for saving numpy arrays to disk with the full information about them.
The
.npyformat is the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk. The format stores all of the shape and dtype information necessary to reconstruct the array correctly even on another machine with a different architecture.The .npz format is the standard format for persisting multiple NumPy arrays on disk. A .npz file is a zip file containing multiple .npy files, one for each array.
So, putting this definitions together you can come up with an answer to your question. Yes, is a way of serialization. Also the process of storing and reading is fast
Comments
np.save(filename, arr) writes the array to a file. Since a file a linear structure it is a form of serialization. But often 'serialization' refers to creating a string that can be sent to a database or over some 'pipeline'. I think you can save to a string buffer, but it takes a bit of trickery.
But in Python most objects have a pickle method, which creates a string which can be written to a file. In that sense pickle is a 2 step process - serialize and then write to file. The pickle for a numpy array is actually a save compatible form. (conversely, np.save of a non-array object uses that object's pickle).
savez writes a zip archive, containing one npy file for each array. It may in addition be compressed. There are OS tools for transferring zip archives to other computers.
1 Comment
np.save() results in serialization or array, though reluctantly, and np.savez is also a serialization process as it also writes the numpy arrays to a file. Am I correct? I am still not sure about the statements after reading your answer.