1

I'm trying to convert an xml file to csv. The file looks like this :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<tns:FlxPtn xmlns:ts2c="http://interop.covea.fr/Covea-Flx-TypesS2C-009" xmlns:tns="http://interop.covea.fr/Covea-App-PolRntVie-024" xmlns:rf2c="http://interop.covea.fr/Covea-Referentiel" xmlns:cov="http://interop.covea.fr/Covea-FlxPtn-002" xmlns:fs2c="http://interop.covea.fr/Covea-Flx-EneFncS2C-007" xsi:schemaLocation="http://interop.covea.fr/Covea-Flx-TypesS2C-009 Covea-Flx-TypesS2C-009.xsd http://interop.covea.fr/Covea-App-PolRntVie-024 S2C_XSD_VIGERIRVES_V10.0_024_MMA_124.xsd http://interop.covea.fr/Covea-App-PolRntVie-024 S2C_XSD_VIGERIRVES_V10.0_024.xsd http://interop.covea.fr/Covea-Referentiel Covea-Referentiel.xsd http://interop.covea.fr/Covea-FlxPtn-002 Covea-FlxPtn-002.xsd http://interop.covea.fr/Covea-Flx-EneFncS2C-007 Covea-Flx-EneFncS2C-007.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <cov:DonEneTch>
    <cov:IdVrsEne>002</cov:IdVrsEne>
    <cov:IdFlx>1V400220191231VIGERIRVESMMA1</cov:IdFlx>
    <cov:TsCraFlx>2020-01-13 10.02.13.000000</cov:TsCraFlx>
    <cov:IdEmtFlx>MMA</cov:IdEmtFlx>
    <cov:IdRctFlx>MMA</cov:IdRctFlx>
    <cov:TyFlx>NOTIF</cov:TyFlx>
    <cov:TyTrtFlx>1</cov:TyTrtFlx>
    <cov:AquFlx>0</cov:AquFlx>
    <cov:EvrExu>PRODUCTION</cov:EvrExu>
    <cov:IdApnEmt>MMA</cov:IdApnEmt>
    <cov:AcnApl>VIGERIRVES</cov:AcnApl>
    <cov:IdVrsFlx>124</cov:IdVrsFlx>
    <cov:IdTrtEmt>Rentes Vie</cov:IdTrtEmt>
    <cov:IdUtl>TTBATCH</cov:IdUtl>
    <cov:VrsCbl></cov:VrsCbl>
    <cov:ChpLbr>1V400220191231VIGERIRVESMMA1377900</cov:ChpLbr>
  </cov:DonEneTch>
  <tns:DonMet>
    <tns:DonEneFnc>
      <fs2c:CodSocJur>1V4002</fs2c:CodSocJur>
      <fs2c:DatArr>20191231</fs2c:DatArr>
      <fs2c:TypFiColl>VIGERIRVES</fs2c:TypFiColl>
      <fs2c:TimStmCreFic>2020-01-13 10.02.13.000000</fs2c:TimStmCreFic>
      <fs2c:CodEns>MMA</fs2c:CodEns>
    </tns:DonEneFnc>

    <tns:PolRntVie>
      <tns:NumEnr>20191290</tns:NumEnr>
      <tns:NumPol>050000111901</tns:NumPol>
      <tns:PMVie>997.75</tns:PMVie>
    </tns:PolRntVie>
    <tns:PolRntVie>
      <tns:NumEnr>20191291</tns:NumEnr>
      <tns:NumPol>050000112002</tns:NumPol>
      <tns:PMVie>4385.15</tns:PMVie>
    </tns:PolRntVie>

What I would like to extract is the last part informations "NumEnr", "NumPol", "PMVie".

I tried to adapt some code samples as below but I'm not familiar enough to xml to make it work.

import pandas as pd
from xml.etree import ElementTree as et
import csv

libname = "C:/Users/a61787/Documents/"

tree = ET.parse(libname+'rente.xml')
with open(libname+'b.csv','w',newline='',encoding='utf8') as sitescope_data:
    csvwriter = csv.writer(sitescope_data)
    col_names = 'Numenr NumPol PMVie'.split()
    csvwriter.writerow(col_names)
    for event in tree.findall('tns:DonMet/tns:PolRntVie'):
        event_data = ['' if (e:=event.find(col)) is None else e.text for col in col_names]
        csvwriter.writerow(event_data)

dataframe = pd.read_csv('b.csv',encoding='utf8')
print(dataframe.shape)

Which give me a empty DF with only the col names.

Finally what i want to obtain is the following table :

Numenr      NumPol          PMVie
20191290    050000111901    997.75
20191291    050000112002    4385.15

If you have any idea to do that, I would be very grateful

1 Answer 1

1

You could do this using beautifulsoup as follows:

from bs4 import BeautifulSoup
import csv

with open('rente.xml') as f_input:
    soup = BeautifulSoup(f_input, "lxml")

with open('b.csv', 'w', newline='', encoding='utf-8') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['Numenr', 'NumPol', 'PMVie'])
    
    for tns in soup.find_all("tns:polrntvie"):
        csv_output.writerow(tns.find(entry).text for entry in ['tns:numenr', 'tns:numpol', 'tns:pmvie'])

This would give you a b.csv file containing:

Numenr,NumPol,PMVie
20191290,050000111901,997.75
20191291,050000112002,4385.15

If items are missing:

from bs4 import BeautifulSoup
import csv

with open('rente.xml') as f_input:
    soup = BeautifulSoup(f_input, "lxml")

with open('b.csv', 'w', newline='', encoding='utf-8') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['Numenr', 'NumPol', 'PMVie'])
    
    for tns in soup.find_all("tns:polrntvie"):
        row = []
        
        for entry in ['tns:numenr', 'tns:numpol', 'tns:pmvie']:
            try:
                row.append(tns.find(entry).text)
            except AttributeError:
                row.append('')
                
        csv_output.writerow(row)
Sign up to request clarification or add additional context in comments.

5 Comments

If I could abuse...I tried to duplicate the code on larger file and have the following error : AttributeError: 'NoneType' object has no attribute 'text'. Have you any idea of the pb behind ?
It probably means not all entries have all items. It would need to test for missing entries.
Thanks, I will search for this.
I have added a version that can cope with missing entries for you, hopefully that should work on your larger file.
Thanks for your time. It works on small example with missing entries. But there is probably other issues with my xml file which contains more than 600K entries and the code bring me back only five rows.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.