Rewriting files in Google Cloud Storage

Rewriting Files in GCP

Note: even though this code is in Python, this should be the same idea in JavaScript, Go, etc.

I wrote the following to copy a file from one Google Cloud Storage bucket to another:

src_blob = src_bucket.blob(file_name)
dest_blob = src_bucket.copy_blob(src_blob, dest_bucket, new_name=new_name)

But for the bigger files (around 120MB or so) I got the following:

Copy spanning locations and/or storage classes could not complete within 30 seconds. Please use the Rewrite method (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite) instead.

I noted that copy_blob has a timeout parameter, so why not try that?

src_blob = src_bucket.blob(file_name)
dest_blob = src_bucket.copy_blob(src_blob, dest_bucket, new_name=new_name, timeout=180)

And… same error:

Copy spanning locations and/or storage classes could not complete within 30 seconds. Please use the Rewrite method (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite) instead.

Note that it still says 30 seconds, so it totally ignored my timeout parameter. Looking at the rewrite docs on the link I note that it is just for the ran JSON API, not for Python like I was using. Some digging and StackOverflow reading, I came up with this snippet:

  src_blob = src_bucket.blob(file_name)

  dest_blob = dest_bucket.blob(file_name)
  rewrite_token = False

  while True:
      rewrite_token, bytes_rewritten, bytes_to_rewrite = dest_blob.rewrite(
            src_blob, token=rewrite_token
        )
      print(
            f"\t{new_name}: Progress so far: {bytes_rewritten}/{bytes_to_rewrite} bytes."
        )
      if not rewrite_token:
            break

That will print out each write to the files… and with my 120MB files, there was only one write. Overall I found this faster than copy_blob even for the small files.

About the Author

Object Partners profile.
Leave a Reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
Unleashing Feature Flags onto Kafka Consumers
Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll […]
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]