OPI Blog
Learn from experts in their fields

Browsing Tags: Spark

Up and Running with AWS Glue
AWS Glue is a managed service that can really help simplify ETL work. In this blog I’m going to cover creating a crawler, creating an ETL job, and setting up a development endpoint. Since Glue ...
Mar 7, 2017
Spark Job Server for Persistent Contexts and Low Latency Jobs
During a recent effort building out the back end datastore of a large analytics platform built on Spark and Kudu, we needed to provide low-latency query performance to our front end application. We leveraged Spark ...
Nov 30, 2016
From Cassandra to S3, with Spark
Apache Cassandra, a scalable and high-availability platform, is a good choice for high volume event management applications, such as large deployments of sensors. Applications include telematics data for large fleets, smart meter telemetry in electric, ...
Oct 13, 2016
Analyzing Kafka data streams with Spark
This blog describes a Spark Streaming application which consumes event data from a Kafka topic to provide continuous, near real-time processing and analysis of the event data stream. The demonstration application, written in Java 8 ...