Rewriting files in Google Cloud Storage

Rewriting Files in GCP

Note: even though this code is in Python, this should be the same idea in JavaScript, Go, etc.

I wrote the following to copy a file from one Google Cloud Storage bucket to another:

src_blob = src_bucket.blob(file_name)
dest_blob = src_bucket.copy_blob(src_blob, dest_bucket, new_name=new_name)

But for the bigger files (around 120MB or so) I got the following:

Copy spanning locations and/or storage classes could not complete within 30 seconds. Please use the Rewrite method (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite) instead.

I noted that copy_blob has a timeout parameter, so why not try that?

src_blob = src_bucket.blob(file_name)
dest_blob = src_bucket.copy_blob(src_blob, dest_bucket, new_name=new_name, timeout=180)

And… same error:

Copy spanning locations and/or storage classes could not complete within 30 seconds. Please use the Rewrite method (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite) instead.

Note that it still says 30 seconds, so it totally ignored my timeout parameter. Looking at the rewrite docs on the link I note that it is just for the ran JSON API, not for Python like I was using. Some digging and StackOverflow reading, I came up with this snippet:

  src_blob = src_bucket.blob(file_name)

  dest_blob = dest_bucket.blob(file_name)
  rewrite_token = False

  while True:
      rewrite_token, bytes_rewritten, bytes_to_rewrite = dest_blob.rewrite(
            src_blob, token=rewrite_token
        )
      print(
            f"\t{new_name}: Progress so far: {bytes_rewritten}/{bytes_to_rewrite} bytes."
        )
      if not rewrite_token:
            break

That will print out each write to the files… and with my 120MB files, there was only one write. Overall I found this faster than copy_blob even for the small files.

About the Author

Mike Hostetler profile.

Mike Hostetler

Principal Technologist

Mike has almost 20 years of experience in technology. He started in networking and Unix administration, and grew into technical support and QA testing. But he has always done some development on the side and decided a few years ago to pursue it full-time. His history of working with users gives Mike a unique perspective on writing software.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Blog Posts
Feature Flags in Terraform
Feature flagging any code can be useful to developers but many don’t know how to or even that you can do it in Terraform. Some benefits of Feature Flagging your code You can enable different […]
Infrastructure as Code – The Wrong Way
You are probably familiar with the term “infrastructure as code”. It’s a great concept, and it’s gaining steam in the industry. Unfortunately, just as we had a lot to learn about how to write clean […]
Snowflake CI/CD using Jenkins and Schemachange
CI/CD and Management of Data Warehouses can be a serious challenge. In this blog you will learn how to setup CI/CD for Snowflake using Schemachange, Github, and Jenkins. For access to the code check out […]
How to get your pull requests approved more quickly
TL;DR The fewer reviews necessary, the quicker your PR gets approved. Code reviews serve an essential function on any software codebase. Done right, they help ensure correctness, reliability, and maintainability of code. On many teams, […]