2

Using pandas 18.1...

I'm trying to iterate through a folder of CSVs to read each CSV and send it to an Oracle database table. There is a non-ascii character lurking in one of my many CSVs (more like reveling in my anguish). I keep getting this error:

UnicodeEncodeError: 'ascii' codec can't encode character '\xab' in position 8: 
ordinal not in range(128)

Here's the code:

import pandas as pd
import pandas.io.sql as psql
from sqlalchemy import create_engine
import cx_Oracle as cx

engine = create_engine('oracle+cx_oracle://schema:'+pwd+'@server:port/service_name'
,encoding='latin1')

name='table'
path=r'path_to_folder'
filelist = os.listdir(path) 

for file in filelist:
    df = pd.read_csv(pathc+'\\'+file,encoding='latin1',index_col=0)
    df=df.astype('unicode')
    df['date'] = pd.to_datetime(df['date'])
    df['date'] = pd.to_datetime(df['Contract_EffDt'],format='%YYYY-%mm-%dd')
    df.to_sql(name, engine, if_exists = 'append')

I've tried the following:

  1. encoding=utf-8 (in the engine, if I do that in read_csv, it throws an error)
  2. Adding ?encoding=utf8 after "service_name" in the engine
  3. Using df=df.astype('unicode') (and not)

What I want to do: Replace the unreadable character with something else and, most importantly, proceed with sending data to Oracle.

Note:

The data file I'm using are from the cms.gov site. Here's a zip file with an example. I'm using the "contracts_info" file.

Thanks in advance!

2 Answers 2

5

You need to set the NLS_LANG environment variable like this:

os.environ['NLS_LANG']= 'AMERICAN_AMERICA.AL32UTF8'

Then the error won't occur.

Sign up to request clarification or add additional context in comments.

Comments

2

I encoded string fields to utf-8 individually and this may have helped (a new error occurred, but I assume it is not related to this):

dfc['Organization Type'] = dfc['Organization Type'].str.encode('utf-8')

New error:

DatabaseError: (cx_Oracle.DatabaseError) ORA-00904: "Contract_ID": invalid identifier

This was because "Contract_ID" was not set as the index. Once I did that, all went well (except for being slower than molasses, which begins my next adventure).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.