0

I'm writing a Python extension module in C++ with PyObject and arrayobject. My question is based on "How to create fixed-width ndarray of strings", which provided a solution to create a fixed-width nparray of strings such as list = {"Rx", "Rx", "Rx", "RxTx", "Tx", "Tx", "Tx", "RxTx", "Rx", "Tx"}. However, I come across a situation that the widths of my strings are random and not equal, like this:

list = {"DataDate", "ukey", "OrderRef", "ticktime", "sign", "side", "orderType", "orderSize", "limitPrice", "Status"}

The list is a vector of strings: std::vector<std::string>. If I detected the longest item and used the solution of"How to create fixed-width ndarray of strings":

static PyObject* string_vector_to_nparray(const std::vector<std::string>& vec, size_t itemsize)
{
    if( !vec.empty() )
    {
        size_t mem_size = vec.size()*itemsize;
        void * mem = PyDataMem_NEW(mem_size);
        size_t cur_index=0;
        for(const auto& val : vec){
            for(size_t i=0;i<itemsize;i++){
                char ch = i < val.size() ? val[i] : 0; // fill with NUL if string too short
                reinterpret_cast<char*>(mem)[cur_index] = ch;
                cur_index++;
            }
        }
        npy_intp dims = static_cast<npy_intp>(vec.size());         
        PyObject* PyArray = PyArray_New(&PyArray_Type, 1, &dims, NPY_STRING, NULL, mem, 4, NPY_ARRAY_OWNDATA, NULL);   
        return PyArray;     
    } 
    else 
    {
        npy_intp dims[1] = {0};
        return (PyObject*) PyArray_ZEROS(1, dims, PyArray_FLOAT, 0);
    }
}

std::vector<std::string> col_list;
col_list.push_back("...");
col_list.push_back("...");
...
auto it = std::max_element(std::begin(col_lists), std::end(col_lists),
    [](std::string& lhs, std::string& rhs){return lhs.size() < rhs.size();});
auto num = it->size(); // here is your max size
std::cout << "Longest: [" << *it << "] of size: " << num<<std::endl;

size_t itemsize = num;
PyObject *PyArray  =  string_vector_to_nparray(col_lists, itemsize);

return PyArray;

the exported array would be like:

np.array([b'Data', b'Date', b'\x00\x00uk', b'ey', b'', b'Orde', b'rRef', b'\x00\x00ti', b'ckti', b'me'], dtype='|S4')

in Python. How to create a non-fixed-width nparray of strings from an existing string vector?

3
  • You'll have to find the length of the longest string, and create the NumPy array with elements of that size, then copy the items from the C++ vector to the NumPy array. In the answer that you linked to, this corresponds to computing itemsize as the length of the longest string in the vector. Commented Apr 21, 2022 at 2:56
  • @WarrenWeckesser It gave array([b'Data', b'Date', b'\x00\x00uk', b'ey', b'', b'Orde', b'rRef',b'\x00\x00ti', b'ckti', b'me'], dtype='|S4') Commented Apr 21, 2022 at 16:27
  • @WarrenWeckesser I updated my question with more codes Commented Apr 21, 2022 at 16:32

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.