0

I'm trying to make multiple dataframes that are subsets of existing dataframes.

I have df_list which is actually a list of datasets:

df_list = [df1B, df2B, df3B, df4B, df5B, df6B, df7B, df8B, df9B, df10B, df11B, df12B, df13B, df14B, df15B, df16B, df17B, df18B, df19B, df20B, df21B, df22B, df23B, df24B, df25B, df26B, df27B, df28B, df30B, df31B, df32B, df33B, df34B, df35B]

If I want to make a subset of a single data set I do this and it works:

df2B = df2B.groupby(['Location']).get_group(36)

It takes all locations with number 36, but when I try to do it for all the data sets in a for loop it doesn't work

for df in df_list:
    df = df.groupby(['Location']).get_group(36)

But this is not making it for each dataset. It doesn't show any error message but it doesn't do anything else either :(

Should I just write the same line 35 times ??? I hope I have a better option.

4
  • After the loop, you want the name df1B to point to the subset? When you make a subset of df1B you want to be able to refer to that subset with the name df1B? Commented May 29, 2019 at 2:56
  • with than name or any other new :) Commented May 29, 2019 at 3:03
  • Perhaps you should describe what you expect the final result to be. Commented May 29, 2019 at 3:04
  • Final result should be list of datasets that are a subset of python df_list Commented May 29, 2019 at 3:08

3 Answers 3

1

If I understand correctly, you can use a list comprehension for this:

subset_df_list = [df.groupby('Location').get_group(36) for df in df_list]

As an aside, your for loop doesn't work because you just keep assigning back to df. You probably want this, which is also the equivalent of the above comprehension:

subset_df_list = []

for df in df_list:
    subset_df = df.groupby('Location').get_group(36)
    subset_df_list.append(subset_df)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much! both options worked for me, now I understand better the concept... So when I want to call any of the subsets I just call them like this? python subset_df_list[i]
@DianaVega Yes, that is correct, except that you're not calling, but accessing. Code blocks (with 3 ```) don't work in comments, by the way.
0
df = [pd.DataFrame({'Location': np.random.randint(0,5,size=(100))}) for i in range(10)]
df = list(map(lambda x: x.groupby('Location').get_group(1), df))

Comments

0

You're assigning to your loop variable, which is then thrown away on the next go around. DataFrame.append isn't inplace, and doesn't have an inplace parameter. Instead:

df1 = pd.DataFrame({'gr': [1,1,2,2], 'v': [1,2,3,2]})
df2 = pd.DataFrame({'gr': [1,1,2,2], 'v': [6,5,4,3]})
df_combined = pd.DataFrame({'gr': [], 'v':[]})
df_combined
Empty DataFrame
Columns: [gr, v]
Index: []
for df in [df1, df2]:
    df_combined = df_combined.append(df.groupby('gr').get_group(1))
df_combined
#     gr    v
# 0  1.0  1.0
# 1  1.0  2.0
# 0  1.0  6.0
# 1  1.0  5.0

Unless you want a list of DataFrames, which it suddenly seems like you do. (I was thrown by df.append(). For a list, append adds to the end in place. For a DataFrame, it does not. In the list case, you want:

# setup as before
combined_dfs = []
for df in [df1, df2]:
    combined_dfs = df_combined.append(df.groupby('gr').get_group(1))

It's a funny way to use DataFrames, but there ya go! :D

2 Comments

The append in my answer is list.append, not pd.DataFrame.append, and what the OP wants is a list of DataFrames, not one single DataFrame.
I just figured that out and edited just before you commented, @gmds. Sorry!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.