0

I work with huge CSV data files and am planning to do few checks before inserting the data line by line to MySQL using Python. As the data files are pretty large opening the files take a hell lot of time. Therefore my aim is to load them without manually analyzing them. I'll be using Python to do the analysis for me. I have started writing the code but got stuck while inserting the data. I'm sure this is a basic issue and am not able to figure it out as I'm a bit new to Python. The Demo data:

id,first_name,last_name,email,boole,coin
1,Emilio,Pettie,[email protected],true,1Lj8Z4Em68hwqRAUXZKW7C7h2KgH5cGpTe
2,Raynard,Fairholme,[email protected],true,1AEwLuECKYD1Bb6EGaBQC1TJS1mtvHBmy3
3,Zonda,Bampkin,[email protected],false,14AHvnRjXExdgfqZBnWUyVi7aWZR8SFBoL
4,Thurstan,Sherville,[email protected],true,19iiiJ53zxmJnbmW7gKH2hoMwpiaqkit8E
5,Jonathan,Jewkes,[email protected],false,18E22TTK68ukQVLWK6oZNfFbzP2uHqaW7o
6,Dolores,Carmichael,[email protected],false,15BBePy5J3WY1QQLTjA79iYQMjDRubv2BD
7,Kleon,Wesker,[email protected],false,1NfYtAuq6M3cXGhDJuDBnCjdEBRSKsfRVJ
8,Laureen,Writtle,[email protected],true,14UgbrWz9wi2UptALs2dFeQRdUiMaLee57
9,Gypsy,Coombes,[email protected],true,1Hn3JBtjytwbBMVJgM7ixAi1sXf56KFM3R
10,Kevina,Boulger,[email protected],false,1GABbcoRTVsX1qzD8uiGtsPtuD1kvzokK1

The code :

import string
import csv
import mysql.connector
mydb=mysql.connector.connect(host="localhost",user="root",password="password",autocommit=True)
mycursor = mydb.cursor()
sql_str=''
sql_str1=''
mycursor.execute("drop table if exists  rd.data")
with open(r"C:\Users\rcsid\Documents\Office Programs\Working prog\MOCK_DATA.csv") as csvfile:
    csv_reader = csv.DictReader(csvfile)
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            sql_str=f'create table rd.data ( {" varchar(50), ".join(row)} varchar(50))'
            mycursor.execute(sql_str)
        sql_str1=f'insert into rd.data values ( {", ".join(row)})'
        print(sql_str1)
        mycursor.execute(sql_str1)
        line_count += 1

I was able to create the table and the header part. But am unable to load the data. The print(sql_str1) output is :

insert into rd.data values ( id, first_name, last_name, email, boole, coin)
insert into rd.data values ( id, first_name, last_name, email, boole, coin)
insert into rd.data values ( id, first_name, last_name, email, boole, coin)
insert into rd.data values ( id, first_name, last_name, email, boole, coin)

And the data getting inserted is null for all the values. Can you please let me know how to capture the data in the csv. I know this maybe a basic syntax. Also I know the syntax cur.execute('INSERT INTO table (columns) VALUES(%s, ....)', row) but don't want to use this as I'll need to open the file to check the header part.

2
  • Hi Pranav, thanks for the reply. It solved my issue. I understand your concern. I am actually building a Python loader for all users who can just load the data without getting in the data. Further more I'll be adding a data shifted check (any line with extra comma) to this code. Therefore, as i dont want the end user to update the python code on their own, I want to make sure most of the work is done by the code itself. Hope I was able to explain. Commented Sep 28, 2020 at 16:06
  • I was able to create the table by the above code: sql_str=f'create table rd.data ( {" varchar(50), ".join(row)} varchar(50))' I had prepared the sql statement in a string and ran the query. Ill look into the issue you had mentioned in your reply. Commented Sep 28, 2020 at 16:07

3 Answers 3

2

I can't think of a good reason you should want to not use parameterized queries. You already know the column names from the CSV header (and you presumably always know this because otherwise how would you create the table?), so why not do it the recommended way? You already open the file when you do with open... and read it row-by-row with the DictReader. Although it isn't public-facing code, your database can break if there's an SQL-injection-like element in your CSV.

DictReader reads in the row as a dictionary. When you iterate over a dictionary, you get its keys not its values. Also, remember that you want to insert these into varchar columns, so you need to enclose them in '

You need to do

col_vals = ", ".join([f"'{v}'" for v in row.values()])
sql_str1=f'insert into rd.data values ({colvals})'

I would strongly suggest you do it using parameters like so:

col_names = ",".join(row) # 'id,first_name,last_name,email,boole,coin'
params = ",".join("%s" for x in row) # '%s,%s,%s,%s,%s,%s'
query = f'insert into rd.data ({col_names}) values ({params})'
mycursor.execute(query, row.values())
Sign up to request clarification or add additional context in comments.

Comments

1

Because you use csv.DictReader you can easily retrieve columns from dictionary keys. In fact, this method skips first row. Additionally, consider executemany with parameterization for only two cursor calls:

with open(r"C:\Path\To\MOCK_DATA.csv") as csvfile:
    csv_reader = csv.DictReader(csvfile)
    data = [row for row in csv_reader]
    
    sql1 = f'CREATE TABLE rd.data ( {" VARCHAR(50), ".join(data[0].keys())} VARCHAR(50))'
    mycursor.execute(sql1)
    mydb.commit()

    sql2 = "INSERT INTO rd.data (`{cols}`) VALUES ({prms})"
    sql2 = sql2.format(cols="`, `".join(data[0].keys()), 
                       prms=", ".join(['%s'] * len(data[0])))  
    
    mycursor.executemany(sql2, [list(d.values()) for d in data])
    mydb.commit()

Online Demo (using SQLite but should align with MySQL)

Comments

0

I was able to resolve the issue with the following code:

mycursor.execute("drop table if exists  rd.data_with_header")
#r"C:\Users\rcsid\Documents\Office Programs\Working prog\MOCK_DATA.csv"
#re.sub('[^a-zA-Z0-9]\n\.', '_', row)
reader = csv.DictReader(open(r"C:\Users\rcsid\Documents\Office Programs\Working prog\MOCK_DATA.csv",encoding='utf-8',errors='ignore'), delimiter=',')
rowHeaders = reader.fieldnames
print(rowHeaders)
for i in rowHeaders:
    field_name.append(re.sub('[^A-Za-z0-9]+', '_', i))
print(field_name)
print(f'''create table rd.data_with_header ( {" varchar(100), ".join(field_name)} varchar(100))''')
sql_str=f'''create table rd.data_with_header ( {" varchar(100), ".join(field_name)} varchar(100))'''
mycursor.execute(sql_str)
for row in reader:
    sql_str1=f'''insert into rd.data values ('{"',' ".join(row.values())}')'''
    print(sql_str1)
    mycursor.execute(sql_str1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.