Rewriting files in Google Cloud Storage

Rewriting Files in GCP

Note: even though this code is in Python, this should be the same idea in JavaScript, Go, etc.

I wrote the following to copy a file from one Google Cloud Storage bucket to another:

src_blob = src_bucket.blob(file_name)
dest_blob = src_bucket.copy_blob(src_blob, dest_bucket, new_name=new_name)

But for the bigger files (around 120MB or so) I got the following:

Copy spanning locations and/or storage classes could not complete within 30 seconds. Please use the Rewrite method (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite) instead.

I noted that copy_blob has a timeout parameter, so why not try that?

src_blob = src_bucket.blob(file_name)
dest_blob = src_bucket.copy_blob(src_blob, dest_bucket, new_name=new_name, timeout=180)

And… same error:

Copy spanning locations and/or storage classes could not complete within 30 seconds. Please use the Rewrite method (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite) instead.

Note that it still says 30 seconds, so it totally ignored my timeout parameter. Looking at the rewrite docs on the link I note that it is just for the ran JSON API, not for Python like I was using. Some digging and StackOverflow reading, I came up with this snippet:

  src_blob = src_bucket.blob(file_name)

  dest_blob = dest_bucket.blob(file_name)
  rewrite_token = False

  while True:
      rewrite_token, bytes_rewritten, bytes_to_rewrite = dest_blob.rewrite(
            src_blob, token=rewrite_token
        )
      print(
            f"\t{new_name}: Progress so far: {bytes_rewritten}/{bytes_to_rewrite} bytes."
        )
      if not rewrite_token:
            break

That will print out each write to the files… and with my 120MB files, there was only one write. Overall I found this faster than copy_blob even for the small files.

About the Author

Mike Hostetler profile.

Mike Hostetler

Principal Technologist

Mike has almost 20 years of experience in technology. He started in networking and Unix administration, and grew into technical support and QA testing. But he has always done some development on the side and decided a few years ago to pursue it full-time. His history of working with users gives Mike a unique perspective on writing software.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Blog Posts
AWS Cloud HSM, Docker and NGINX
There is quite a bit of easily searchable content on the security benefits of leveraging a Hardware Security Module to manage cryptographic keys, so I will leave that to the scope of another article. The […]
Google Professional Machine Learning Engineer Exam 2021
Exam Description A Professional Machine Learning Engineer designs, builds, and productionizes ML models to solve business challenges using Google Cloud technologies and knowledge of proven ML models and techniques. The ML Engineer is proficient in all aspects […]
Designing Kubernetes Controllers
There has been some excellent online discussion lately around Kubernetes controllers, highlighted by an excellent Speakerdeck presentation assembled by Tim Hockin. What I’d like to do in this post is explore some of the implications […]
React Server Components
The React Team recently announced new work they are doing on React Server Components, a new way of rendering React components. The goal is to create smaller bundle sizes, speed up render time, and prevent […]