Rewriting files in Google Cloud Storage

Rewriting Files in GCP

Note: even though this code is in Python, this should be the same idea in JavaScript, Go, etc.

I wrote the following to copy a file from one Google Cloud Storage bucket to another:

src_blob = src_bucket.blob(file_name)
dest_blob = src_bucket.copy_blob(src_blob, dest_bucket, new_name=new_name)

But for the bigger files (around 120MB or so) I got the following:

Copy spanning locations and/or storage classes could not complete within 30 seconds. Please use the Rewrite method (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite) instead.

I noted that copy_blob has a timeout parameter, so why not try that?

src_blob = src_bucket.blob(file_name)
dest_blob = src_bucket.copy_blob(src_blob, dest_bucket, new_name=new_name, timeout=180)

And… same error:

Copy spanning locations and/or storage classes could not complete within 30 seconds. Please use the Rewrite method (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite) instead.

Note that it still says 30 seconds, so it totally ignored my timeout parameter. Looking at the rewrite docs on the link I note that it is just for the ran JSON API, not for Python like I was using. Some digging and StackOverflow reading, I came up with this snippet:

  src_blob = src_bucket.blob(file_name)

  dest_blob = dest_bucket.blob(file_name)
  rewrite_token = False

  while True:
      rewrite_token, bytes_rewritten, bytes_to_rewrite = dest_blob.rewrite(
            src_blob, token=rewrite_token
        )
      print(
            f"\t{new_name}: Progress so far: {bytes_rewritten}/{bytes_to_rewrite} bytes."
        )
      if not rewrite_token:
            break

That will print out each write to the files… and with my 120MB files, there was only one write. Overall I found this faster than copy_blob even for the small files.

About the Author

Mike Hostetler profile.

Mike Hostetler

Principal Technologist

Mike has almost 20 years of experience in technology. He started in networking and Unix administration, and grew into technical support and QA testing. But he has always done some development on the side and decided a few years ago to pursue it full-time. His history of working with users gives Mike a unique perspective on writing software.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Blog Posts
Simple improvements to making decisions in teams
Software development teams need to make a lot of decisions. Functional requirements, non-functional requirements, user experience, API contracts, tech stack, architecture, database schemas, cloud providers, deployment strategy, test strategy, security, and the list goes on. […]
JavaScript Bundle Optimization – Polyfills
If you are lucky enough to only support a small subset of browsers (for example, you are targeting a controlled set of users), feel free to move along. However, if your website is open to […]
Creating Mocks For Unit Testing in Go
Unit testing is an important part of any project, and Go built its framework with a testing package; making unit testing part of the language. This testing framework is good for most scenarios, but you […]
Resetting Database Between Spring Integration Tests
When tasked with having to write an integration test or a Spring Webflux test that uses a database, it can be cumbersome to have to reset the database between each test by using @DirtiesContext. Using […]