I'm trying to replace some adresses in French in a dataframe. I'm using a list and regular expressions and one list.
def adresses(df):
liste_adresses = ['allée', 'Allée', 'rue', 'Rue', 'avenue', 'Avenue', 'av', 'AV', 'boulevard', 'Boulevard', 'bd', 'Bd', 'carreau', 'Carreau', 'carrefour', 'Carrefour', 'place', 'Place', 'voie', 'Voie', 'villa', 'Villa', 'route', 'Route', 'quai', 'Quai']
for i in liste_adresses:
df['C'] = df['C'].str.replace(r'[0-9]+(,|\s+)i\s+\w+\s+(\w+)?(\s+)?(\w+)?(\s+)?([0-9]{5})?(\s+)?\w+?([0-9]{5})?','<address>')
return df
My dataframe:
A B C
French house I live in 15 rue Louis Philippe 75001 Neuilly
English house my address: 101-102 bd Charles de Gaulle 75001 Paris
French apartment my name is Liam
French house Hello George!
English apartment This is wrong: 4, rue Ledion Paris 75014 and I'm not happy with it
On my output, nothing happens.
Good output:
A B C
French house I live in <address>
English house my address: <address>
French apartment my name is Liam
French house Hello George!
English apartment This is wrong: <address> and I'm not happy with it
ithat contains the elements ofliste_adressesis embedded in the regex you define'[0-9]+(,|\s+)i\s+\...'so it is looking for the letterinot its value (for example'allée'). It would be more:'[0-9]+(,|\s+)' + i + '\s+\...'and then something happens although it is not the expected output.Cends by the address? By this, I mean could have more character after such asThis is wrong: 4, rue Ledion Paris 75014 and I'm not happy with it?liste_adresses? or you have to many cities in you data?