top of page
  • Writer's pictureMagdalena Konkiewicz

How to successfully add large data sets to Google Drive and use them in Google Colab

Image by Speedy McVroom from Pixabay


In this post, I will explain how to add large data sets to Google Drive so they can be accessed from Google Colab for processing and modeling.

Whereas uploading a single file can be done with the drag and drop interface of Google Drive, it becomes more difficult with a large number of files. Dragging the whole folder containing 1GB of files just fails and freezes Google Drive. The alternative is to drag a zipped folder. This process is usually successful and does not take even that long (a couple of minutes with 1GB file) but the problem comes with unzipping the file in Google Drive itself which results in random files missing.

My trials for three days to successfully upload photo data for CNN training in order to use free GPU led me to establish this alternative. I am going to describe here step by step how to upload successfully big data sets so they can be processed by Google Colab and take advantage of VMs provided by their services.


1. Zip the folder with the files. In my case, I had a folder called 'train' with 70257 .jpg files that were taking around 1GB.

2. Upload the zipped file using Google Drive Interface. I have uploaded mine to an empty directory called data.

3. Open a new Google Colab file and mount it to Google Drive to be able to access the zip file. If you do not have Colab installed you can follow this article that explains how to do it.

The command below will start the mounting process.

from google.colab import drive

You will be asked to authorize access to Google Drive.

Follow instructions to give authorization by copy-pasting the code and you should be mounted.

4. Now extract files to the local environment with the following command.

!unzip gdrive/My\ Drive/data/

Note that my file in 'data' folder is located in the Google Drive root directory. You will need to modify the path accordingly to where your file is located.

You should see file unzipping.

It takes less than 1 minute to unzip 1GB so you should not wait too long. Once you are comfortable that this command is working you can use the variation below that suppresses the output.

!unzip gdrive/My\ Drive/data/ > /dev/null

Once the cell has executed you can see the files have appeared in the local train folder. You can find it on the left-hand side of the colab interface.

5. You can use the files for anything right now. Juts access them from the new 'train' folder. In my case, I can display the first image using the following code.

import tensorflow as tf
img = tf.keras.preprocessing.image.load_img('train/abs_000.jpg')

You can now use your data for anything you wish. In my case, it was training a CNN using free GPU.


The process that I described above worked best for me! So please free to copy it.

On the other hand, it would be much better to unzip the files in Google Drive, so they just stay there. This however was 'mission impossible' for me for thee days in a row. Every time I have unzipped the files and saved them in Google Drive there were missing photos in the end.

I have tried to do it programmatically with Colab as well as using Zip Extractor connected to Google Drive. Both methods resulted in missing files with no warning about it. I have also tried two different Google accounts and the problem persisted.

After losing 3 days of trial and error. I came with the process described above and it works smoothly for me every time I rerun the code.

I hope it helps others that try to load large data sets to Google Drive in order to process them in Colab. If you are one that has found a better method or was able to unzip large files in Google Drive, I would be curious how this has been done.

Happy coding!

37,241 views5 comments

Recent Posts

See All


Nisreen Alaas
Nisreen Alaas
Jun 26, 2023

Thank help me alot and save my time.


Bob Wenzlau
Bob Wenzlau
Nov 04, 2022

What would the path be for a shared drive where the shared drive is "main"? The mount command completed. Thank you I used !unzip gdrive/Shared\ Drive/main/ but get this error

unzip: cannot find or open gdrive/Shared Drive/main/, gdrive/Shared Drive/main/ or gdrive/Shared Drive/main/


Nov 02, 2022

Thank you


Adil Latif Habibi
Adil Latif Habibi
Oct 11, 2022

oooh thanks so much, very helpfull.. this is article that i wanted. i have scraped images from search engine and mix it with mine in my local drive, animal datset. so i have 1 GB with about 50.000 images from 6 folders in it. so i zipped this dataset and use

from google.colab import files uploaded = files.upload()

i upload this dataset right from my local drive and the result is only 9% uploaded in about 30 minutes. So then i cancelled this process in colab, then planning trying to upload dataset to GDrive first and then upload in from google drive, persisted of what you did, but i'm not sure it become an easy process, maybe it will taking muc…


Sharon Boban
Sharon Boban
Jul 28, 2022

Thanks a lot! This was very helpful! Appreciate your efforts in putting this together. :)

bottom of page