Convert XML to csv file with Python

Question

I'm trying to convert an xml file to csv. The file looks like this :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<tns:FlxPtn xmlns:ts2c="http://interop.covea.fr/Covea-Flx-TypesS2C-009" xmlns:tns="http://interop.covea.fr/Covea-App-PolRntVie-024" xmlns:rf2c="http://interop.covea.fr/Covea-Referentiel" xmlns:cov="http://interop.covea.fr/Covea-FlxPtn-002" xmlns:fs2c="http://interop.covea.fr/Covea-Flx-EneFncS2C-007" xsi:schemaLocation="http://interop.covea.fr/Covea-Flx-TypesS2C-009 Covea-Flx-TypesS2C-009.xsd http://interop.covea.fr/Covea-App-PolRntVie-024 S2C_XSD_VIGERIRVES_V10.0_024_MMA_124.xsd http://interop.covea.fr/Covea-App-PolRntVie-024 S2C_XSD_VIGERIRVES_V10.0_024.xsd http://interop.covea.fr/Covea-Referentiel Covea-Referentiel.xsd http://interop.covea.fr/Covea-FlxPtn-002 Covea-FlxPtn-002.xsd http://interop.covea.fr/Covea-Flx-EneFncS2C-007 Covea-Flx-EneFncS2C-007.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <cov:DonEneTch>
    <cov:IdVrsEne>002</cov:IdVrsEne>
    <cov:IdFlx>1V400220191231VIGERIRVESMMA1</cov:IdFlx>
    <cov:TsCraFlx>2020-01-13 10.02.13.000000</cov:TsCraFlx>
    <cov:IdEmtFlx>MMA</cov:IdEmtFlx>
    <cov:IdRctFlx>MMA</cov:IdRctFlx>
    <cov:TyFlx>NOTIF</cov:TyFlx>
    <cov:TyTrtFlx>1</cov:TyTrtFlx>
    <cov:AquFlx>0</cov:AquFlx>
    <cov:EvrExu>PRODUCTION</cov:EvrExu>
    <cov:IdApnEmt>MMA</cov:IdApnEmt>
    <cov:AcnApl>VIGERIRVES</cov:AcnApl>
    <cov:IdVrsFlx>124</cov:IdVrsFlx>
    <cov:IdTrtEmt>Rentes Vie</cov:IdTrtEmt>
    <cov:IdUtl>TTBATCH</cov:IdUtl>
    <cov:VrsCbl></cov:VrsCbl>
    <cov:ChpLbr>1V400220191231VIGERIRVESMMA1377900</cov:ChpLbr>
  </cov:DonEneTch>
  <tns:DonMet>
    <tns:DonEneFnc>
      <fs2c:CodSocJur>1V4002</fs2c:CodSocJur>
      <fs2c:DatArr>20191231</fs2c:DatArr>
      <fs2c:TypFiColl>VIGERIRVES</fs2c:TypFiColl>
      <fs2c:TimStmCreFic>2020-01-13 10.02.13.000000</fs2c:TimStmCreFic>
      <fs2c:CodEns>MMA</fs2c:CodEns>
    </tns:DonEneFnc>

    <tns:PolRntVie>
      <tns:NumEnr>20191290</tns:NumEnr>
      <tns:NumPol>050000111901</tns:NumPol>
      <tns:PMVie>997.75</tns:PMVie>
    </tns:PolRntVie>
    <tns:PolRntVie>
      <tns:NumEnr>20191291</tns:NumEnr>
      <tns:NumPol>050000112002</tns:NumPol>
      <tns:PMVie>4385.15</tns:PMVie>
    </tns:PolRntVie>

What I would like to extract is the last part informations "NumEnr", "NumPol", "PMVie".

I tried to adapt some code samples as below but I'm not familiar enough to xml to make it work.

import pandas as pd
from xml.etree import ElementTree as et
import csv

libname = "C:/Users/a61787/Documents/"

tree = ET.parse(libname+'rente.xml')
with open(libname+'b.csv','w',newline='',encoding='utf8') as sitescope_data:
    csvwriter = csv.writer(sitescope_data)
    col_names = 'Numenr NumPol PMVie'.split()
    csvwriter.writerow(col_names)
    for event in tree.findall('tns:DonMet/tns:PolRntVie'):
        event_data = ['' if (e:=event.find(col)) is None else e.text for col in col_names]
        csvwriter.writerow(event_data)

dataframe = pd.read_csv('b.csv',encoding='utf8')
print(dataframe.shape)

Which give me a empty DF with only the col names.

Finally what i want to obtain is the following table :

Numenr      NumPol          PMVie
20191290    050000111901    997.75
20191291    050000112002    4385.15

If you have any idea to do that, I would be very grateful

Martin Evans · Accepted Answer · 2021-02-05 08:24:54Z

1

You could do this using beautifulsoup as follows:

from bs4 import BeautifulSoup
import csv

with open('rente.xml') as f_input:
    soup = BeautifulSoup(f_input, "lxml")

with open('b.csv', 'w', newline='', encoding='utf-8') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['Numenr', 'NumPol', 'PMVie'])
    
    for tns in soup.find_all("tns:polrntvie"):
        csv_output.writerow(tns.find(entry).text for entry in ['tns:numenr', 'tns:numpol', 'tns:pmvie'])

This would give you a b.csv file containing:

Numenr,NumPol,PMVie
20191290,050000111901,997.75
20191291,050000112002,4385.15

If items are missing:

from bs4 import BeautifulSoup
import csv

with open('rente.xml') as f_input:
    soup = BeautifulSoup(f_input, "lxml")

with open('b.csv', 'w', newline='', encoding='utf-8') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['Numenr', 'NumPol', 'PMVie'])
    
    for tns in soup.find_all("tns:polrntvie"):
        row = []
        
        for entry in ['tns:numenr', 'tns:numpol', 'tns:pmvie']:
            try:
                row.append(tns.find(entry).text)
            except AttributeError:
                row.append('')
                
        csv_output.writerow(row)

edited Feb 5, 2021 at 8:24

answered Feb 4, 2021 at 16:52

Martin Evans

46.9k17 gold badges88 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Bebio Over a year ago

If I could abuse...I tried to duplicate the code on larger file and have the following error : AttributeError: 'NoneType' object has no attribute 'text'. Have you any idea of the pb behind ?

Martin Evans Over a year ago

It probably means not all entries have all items. It would need to test for missing entries.

Bebio Over a year ago

Thanks, I will search for this.

Martin Evans Over a year ago

I have added a version that can cope with missing entries for you, hopefully that should work on your larger file.

Bebio Over a year ago

Thanks for your time. It works on small example with missing entries. But there is probably other issues with my xml file which contains more than 600K entries and the code bring me back only five rows.

Collectives™ on Stack Overflow

Convert XML to csv file with Python

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related