Replacing array values in a Pandas DataFrame via iteration

Question

I am working with a Pandas DataFrame that has a column of entries in arrays, such as the following example:

    user_id    tags
0      1       [a,b,c]
1      2       [a,b,d]
2      3       [b,c]
...
n      n       [a,d]

I have some tag ids that correlate to the simplified tags in a JSON object and am trying to replace the entries with their non-simplified variants with the following method:

for user_tags in dataset['tags']:
    for tag in user_tags:
        for full_tag in UUIDtags['tags_full']:
            if full_tag['id'] == tag:
                tag = entry['name']

id and name are corresponding simplified tags and full tag names in the JSON object.

However, this does not change the value upon execution; is there a Pandas method that I am missing to replace these values? I am afraid that I will replace the entire array rather than replace the individual entries.

Thank you!

EDIT: An example of what the JSON object (UUIDtags) contains.

{
    "tags_full": [{
        "id": "a",
        "name": "Alpha"
    }, {
        "id": "b",
        "name": "Beta"
....

Can you post what other data full_t and UUIDtags look like? It's hard to test ideas with access to only half the info... — spies006
– spies006, Commented May 26, 2017 at 20:01
Sorry! I fixed up some of the inconsistencies in my question, thanks. — kalle
– kalle, Commented May 26, 2017 at 20:11

spies006 · Accepted Answer · 2017-05-26 21:20:20Z

1

Create sample data.

>>> df = pd.DataFrame({'tags':[list(['a', 'b', 'c']), 
list(['a', 'b', 'd']), list(['b', 'c'])], 'user_id': [i for i in range(1,4)]})

>>> df
        tags  user_id
0  [a, b, c]        1
1  [a, b, d]        2
2     [b, c]        3

Generate a replacement dictionary with letters as the keys and full tag as values.

>>> replace_dict = {'a': 'Alpha', 'b': 'Beta', 'Charlie': 'c', 'Delta': 'd'}

Okay, back to the solution...do the iterations over rows and letters in each row replacing using the corresponding values in replacement_dict.

>>> for row in range(len(df)):
...     for tag in range(len(df.loc[row, 'tags'])):
...             df.loc[row, 'tags'][tag] = replace_dict[df.loc[row, 'tags'][tag]]
...

Here is the result.

>>> df
                     tags  user_id
0  [Alpha, Beta, Charlie]        1
1    [Alpha, Beta, Delta]        2
2         [Beta, Charlie]        3

Side note: The creation of replacement_dict was a rather ad hoc creation of a replacement dictionary based on the letters that appears in my sample data. For you to generate such a replacement dictionary for your full data you could do this.

For example, let's suppose UUIDtags is your full JSON object

>>> UUIDtags = {'tags_full': [{'id':'a', 'name':'Alpha'}, {'id':'b', 'name':'Beta'}]}

We can generate a replacement dict like this

>>> uuidtags_dict = {}
>>> for tag in UUIDtags['tags_full']:
...     uuidtags_dict[tag['id']] = tag['name']
... 
>>> uuidtags_dict
{'a': 'Alpha', 'b': 'Beta'}

This generation of the replacement dictionary will scale to your entire JSON object based on the sample that you provided in your edit.

edited May 26, 2017 at 21:20

answered May 26, 2017 at 21:04

spies006

2,9472 gold badges22 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

kalle Over a year ago

Thanks for the detailed response! I understand your treatment quite thoroughly I think, but I receive the error message KeyError: 'the label [7] is not in the [index] upon execution; I am trying to debug the error now.

spies006 Over a year ago

@Kam you probably need to reset the index on dataset ... dataset.reset_index(inplace=True)

kalle Over a year ago

You are correct, I thought I had that in there! Thanks - works like a charm now!

Collectives™ on Stack Overflow

Replacing array values in a Pandas DataFrame via iteration

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related