AWS Glue is a managed service that can really help simplify ETL work. In this blog I’m going to cover creating a crawler, creating an ETL job, and setting up a development endpoint. Since Glue ...
During a recent effort building out the back end datastore of a large analytics platform built on Spark and Kudu, we needed to provide low-latency query performance to our front end application. We leveraged Spark ...
Apache Cassandra, a scalable and high-availability platform, is a good choice for high volume event management applications, such as large deployments of sensors. Applications include telematics data for large fleets, smart meter telemetry in electric, ...
This blog describes a Spark Streaming application which consumes event data from a Kafka topic to provide continuous, near real-time processing and analysis of the event data stream. The demonstration application, written in Java 8 ...