1

I have data in many columns like the following in pandas dataframe:

col1|   col2|   ...|   col99    |col100
MBs|    Gigabytes|...|  MBs|    |MBs
Megabytes|   GBs|...|Megabytes  |Gigabytes
GB  |   Megabytes|  ...|Gigabytes|Gigabytes
GBs |   GB     |...   |MBs  |Gigabytes
Gigabytes|Megabytes|...|Gigabytes   |Megabytes

I have also a dictionary which maps similar values. For example,

mapping = {'Megabytes':'MB', 'Gigabytes':'GB', 'MBs':'MB','GBs':'GB', 'GB':'GB',}

I want to replace each value in the column with mapped values in the dict. Currently I am trying to do something like this but getting an error. Expected output should be

col1|col2|...|col99|col100
MB| GB|...| MB| |MB
MB|GB|...|MB|GB
GB |MB|...|GB|GB
GB|GB|...|MB|GB
GB|MB|...|GB|MB

# My current implementation
df = df.apply(lambda x: x.astype(str).replace('GBs', 'GB').replace('MBs', 'MB').replace('Megabytes', 'MB').replace('Gigabytes', 'GB'))

Can someone guide me a correct and faster way of doing this ?

1
  • Are all your columns are of object (string) dtype? Commented Jul 26, 2017 at 19:47

2 Answers 2

3

pd.DataFrame.replace can take a dictionary of dictionaries where the first level of keys specify the column to apply the value when replacing.

We can use a dictionary comprehension to filter only those columns that are of dtype == object

df.replace({c: mapping for c in df if df[c].dtype == object})

  col1 col2 col99 col100
0   MB   GB    MB     MB
1   MB   GB    MB     GB
2   GB   MB    GB     GB
3   GB   GB    MB     GB
4   GB   MB    GB     MB
Sign up to request clarification or add additional context in comments.

Comments

2

Try this:

df.loc[:, df.dtypes=='object'] = df.select_dtypes(['object']).replace(mapping, regex=True)

This will apply mapping only to string columns


If all your columns are of string (object) dtype:

df = df.replace(mapping, regex=True)

or as @JohnGalt has proposed in the comment:

df = df.applymap(lambda x: mapping[x])

5 Comments

I'd use replace too. df.applymap(lambda x: mapping[x]) is another alternative, assuming all values have mapping.
@JohnGalt, yes, we can use .applymap() too :) Actually in the last Pandas versions it got much faster - it was slower in the past...
I swear if I went back and retagged the appropriate questions, you'd have a gold badge for replace
df.replace({c: mapping for c in df if df[c].dtype == object})
@piRSquared, this is nice! Please turn it into an answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.