2

I have got a pandas dataframe with a cost column that I am attempting to format. Basically, replacing the string and standardizing it as cost value is pulled from different sources. There are also some 'NaN'.

Here's some sample data:

$2.75 
nan
4.150000
25.00
$4.50

I have the following code that I am using to standardize the format of values in the column.

for i in range(len(EmpComm['Cost(USD)'])):

    if (pd.isnull(EmpComm['Cost(USD)'][i])):
        print(EmpComm['Cost(USD)'][i], i)
        #EmpComm['Cost(USD)'] = EmpComm['Cost(USD)'].iloc[i].fillna(0, inplace=True)

    if type(EmpComm['Cost(USD)'].iloc[i]) == str:
       #print('string', i)
       EmpComm['Cost(USD)'] = EmpComm['Cost(USD)'].iloc[i].replace('$','')

Output:

0      2.75
1      2.75
2      2.75
3      2.75
4      2.75
5      2.75

All values are placed with 2.75. It is running the second if statement for all column values as they're formatted as a string.

My question is: How would you format it?

1 Answer 1

2

In general, you should avoid manual for loops and use vectorised functionality, where possible, with Pandas. Here you can utilise pd.to_numeric to test and convert values within your series:

s = pd.Series(['$2.75', np.nan, 4.150000, 25.00, '$4.50'])

strs = s.astype(str).str.replace('$', '', regex=False)
res = pd.to_numeric(strs, errors='coerce').fillna(0)

print(res)

0     2.75
1     0.00
2     4.15
3    25.00
4     4.50
dtype: float64
Sign up to request clarification or add additional context in comments.

1 Comment

makes sense. Loops are not as efficient when working with pandas. Vectorized seems to be the way to go. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.