Upload csv files to Postgresql (AWS RDS) using python’s Psycopg2
In this tutorial I’m going to explain how to upload large cvs files from a remote location (client) to a remote postgresql server. This process is a great solution for your data science projects (ETL specifically).
- Create table to run our test
I suggest creating your field with varchar(n) to avoid type errors while trying to upload your CSV file.
2. Create a demo CSV file
Is important:
- not having extra columns (null values are fine)
- not having extra blank rows at the end
- Having complex data types set in your database that do not match the incoming data. (For this I suggest (If possible) setting your fields to varchar and then doing the necessary transformations)
Example of my demo file:
Notice that my CVS file contains a header. (it’s ok with out it as well)
3. Preparing our code
Here you can see my clean “insert_demo” table under the business_intelligence schema.
installing and importing libraries
Next I use pandas just to verify my CSV’s shape
Creating our insert function
A function is not necessary but I prefer to keep everything as functions for portability and team work.
The copy function receives the following parameters:
- cursor: c
- file_path: “./exampleData/books.csv”
- table_name: “insert_demo”
- cols_tuple: tuple_demo = (“colname1”,”colname2", “colname3”,…)
Creating our variables for file path, table name and columns.
3. Function for establishing connection
Here we establish our connection using psycopg2’s connect function. We should pass to the function the following parameters:
- user, password, host, port, database
Once we get the connection object we extract our cursor object which will help us to execute our commands.
Once we get our cursor, we can call our copy_query function and pass it down along with our file path, table name and columns tuple.
After calling our function which will execute psycopg2’s copy_from function. Which is the method that loads data from a remote location into postgres.
It’s important to commit our connection in order to save changes into the database permanently.
Close connection and add a exception handler
Make sure to close cursor and connection
Results:
AND DONE :)
Feedback is appreciated!
Happy Coding!
- Alejandro