Collecting data in loop using Pythons Pandas Dataframe

Question

I am trying to extract data from a csv file using python's pandas module. The experiment data has 6 columns (lets say a,b,c,d,e,f) and i have a list of model directories. Not every model has all 6 'species' (columns) so i need to split the data specifically for each model. Here is my code:

    def read_experimental_data(self,experiment_path):
        [path,fle]=os.path.split(experiment_path)
        os.chdir(path)
        data_df=pandas.read_csv(experiment_path) 
#        print data_df
        experiment_species=data_df.keys() #(a,b,c,d,e,f)
#        print experiment_species
        for i in self.all_models_dirs: #iterate through a list of model directories.
            [path,fle]=os.path.split(i)
            model_specific_data=pandas.DataFrame()
            species_dct=self.get_model_species(i+'.xml') #gives all the species (culuns) in this particular model
#            print species_dct
            #gives me only species that are included in model dir i
            for l in species_dct.keys(): 
                for m in experiment_species:
                    if l == m:
                         #how do i collate these pandas series into a single dataframe?
                        print data_df[m]

The above code gives me the correct data but i'm having trouble collecting it in a usable format. I've tried to merge and concatenate them but no joy. Does any body know how to do this?

Thanks

Gabriel · Accepted Answer · 2015-11-09 12:24:38Z

1

You can create a new DataFrame from data_df by passing it a list of columns you want,

import pandas as pd
df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6], 'c': [7,8,9]})
df_filtered = df[['a', 'c']]

or an example using some of your variable names,

import pandas as pd
data_df = pd.DataFrame({'a': [1,2], 'b': [3,4], 'c': [5,6],
                   'd': [7,8], 'e': [9,10], 'f': [11,12]})
experiment_species = data_df.keys()
species_dct = ['b', 'd', 'e', 'x', 'y', 'z']
good_columns = list(set(experiment_species).intersection(species_dct))
df_filtered = data_df[good_columns]

edited Nov 9, 2015 at 12:24

answered Nov 9, 2015 at 12:14

Gabriel

11k1 gold badge26 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Collecting data in loop using Pythons Pandas Dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related