Upload csv files to Postgresql (AWS RDS) using python’s Psycopg2

In this tutorial I’m going to explain how to upload large cvs files from a remote location (client) to a remote postgresql server. This process is a great solution for your data science projects (ETL specifically).

PostgreSQL — Python
  1. Create table to run our test

I suggest creating your field with varchar(n) to avoid type errors while trying to upload your CSV file.

2. Create a demo CSV file

Is important:

  • not having extra columns (null values are fine)
  • not having extra blank rows at the end
  • Having complex data types set in your database that do not match the incoming data. (For this I suggest (If possible) setting your fields to varchar and then doing the necessary transformations)

Example of my demo file:

Plain CSV file from a books dataset (Contains 11k+ rows)

Notice that my CVS file contains a header. (it’s ok with out it as well)

3. Preparing our code

Here you can see my clean “insert_demo” table under the business_intelligence schema.

installing and importing libraries

Next I use pandas just to verify my CSV’s shape

Here I just verified my CSV’s size. You can notice we have 11,127 rows and 3 columns

Creating our insert function

A function is not necessary but I prefer to keep everything as functions for portability and team work.

The copy function receives the following parameters:

  • cursor: c
  • file_path: “./exampleData/books.csv”
  • table_name: “insert_demo”
  • cols_tuple: tuple_demo = (“colname1”,”colname2", “colname3”,…)

Creating our variables for file path, table name and columns.

3. Function for establishing connection

Here we establish our connection using psycopg2’s connect function. We should pass to the function the following parameters:

  • user, password, host, port, database

Once we get the connection object we extract our cursor object which will help us to execute our commands.

Once we get our cursor, we can call our copy_query function and pass it down along with our file path, table name and columns tuple.

After calling our function which will execute psycopg2’s copy_from function. Which is the method that loads data from a remote location into postgres.

It’s important to commit our connection in order to save changes into the database permanently.

Close connection and add a exception handler

Make sure to close cursor and connection

Results:

SUCCESS!!
SUCCESS!!

AND DONE :)

Feedback is appreciated!

Happy Coding!

  • Alejandro

Software engineer (CS Grad). I love web technologies, Coffee and value creation.