1

I am working with a Pandas DataFrame that has a column of entries in arrays, such as the following example:

    user_id    tags
0      1       [a,b,c]
1      2       [a,b,d]
2      3       [b,c]
...
n      n       [a,d]

I have some tag ids that correlate to the simplified tags in a JSON object and am trying to replace the entries with their non-simplified variants with the following method:

for user_tags in dataset['tags']:
    for tag in user_tags:
        for full_tag in UUIDtags['tags_full']:
            if full_tag['id'] == tag:
                tag = entry['name']

id and name are corresponding simplified tags and full tag names in the JSON object.

However, this does not change the value upon execution; is there a Pandas method that I am missing to replace these values? I am afraid that I will replace the entire array rather than replace the individual entries.

Thank you!

EDIT: An example of what the JSON object (UUIDtags) contains.

{
    "tags_full": [{
        "id": "a",
        "name": "Alpha"
    }, {
        "id": "b",
        "name": "Beta"
....
2
  • Can you post what other data full_t and UUIDtags look like? It's hard to test ideas with access to only half the info... Commented May 26, 2017 at 20:01
  • Sorry! I fixed up some of the inconsistencies in my question, thanks. Commented May 26, 2017 at 20:11

1 Answer 1

1

Create sample data.

>>> df = pd.DataFrame({'tags':[list(['a', 'b', 'c']), 
list(['a', 'b', 'd']), list(['b', 'c'])], 'user_id': [i for i in range(1,4)]})

>>> df
        tags  user_id
0  [a, b, c]        1
1  [a, b, d]        2
2     [b, c]        3

Generate a replacement dictionary with letters as the keys and full tag as values.

>>> replace_dict = {'a': 'Alpha', 'b': 'Beta', 'Charlie': 'c', 'Delta': 'd'}

Okay, back to the solution...do the iterations over rows and letters in each row replacing using the corresponding values in replacement_dict.

>>> for row in range(len(df)):
...     for tag in range(len(df.loc[row, 'tags'])):
...             df.loc[row, 'tags'][tag] = replace_dict[df.loc[row, 'tags'][tag]]
... 

Here is the result.

>>> df
                     tags  user_id
0  [Alpha, Beta, Charlie]        1
1    [Alpha, Beta, Delta]        2
2         [Beta, Charlie]        3

Side note: The creation of replacement_dict was a rather ad hoc creation of a replacement dictionary based on the letters that appears in my sample data. For you to generate such a replacement dictionary for your full data you could do this.

For example, let's suppose UUIDtags is your full JSON object

>>> UUIDtags = {'tags_full': [{'id':'a', 'name':'Alpha'}, {'id':'b', 'name':'Beta'}]}

We can generate a replacement dict like this

>>> uuidtags_dict = {}
>>> for tag in UUIDtags['tags_full']:
...     uuidtags_dict[tag['id']] = tag['name']
... 
>>> uuidtags_dict
{'a': 'Alpha', 'b': 'Beta'}

This generation of the replacement dictionary will scale to your entire JSON object based on the sample that you provided in your edit.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the detailed response! I understand your treatment quite thoroughly I think, but I receive the error message KeyError: 'the label [7] is not in the [index] upon execution; I am trying to debug the error now.
@Kam you probably need to reset the index on dataset ... dataset.reset_index(inplace=True)
You are correct, I thought I had that in there! Thanks - works like a charm now!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.