Showing 7 Result(s)

Databricks and Spark Crash Course. Delta and More!

I’ve been working on a Databricks and Delta tutorial for all of you. I published it as notebook and you can grab it here. We will load some sample data from the NYC taxi dataset available in databricks, load them and store them as table. We will use then python to do some manipulation (Extract …

Unload data from AWS Redshift to S3 in Parquet

Following the previous redshift articles in this one I will explain how to export data from redshift to parquet in s3. This can be interesting when we want to archive (infrequently queried) data to be queried cheaper with spectrum, or to store in s3 archive, or to export to another storage solution like glacier. The …

Automatizing a Bigdata (CDH) cluster installation

It’s been a while without writing any post but this one is going to be useful for people that face the task of installing a CDH cluster from scratch. There are a few prerequisites before starting with the installation that need to be configured or otherwise the installation can/will crash. To address some of these …

Cloudera Manager installation in Google Cloud

Installation of Cloudera Manager and small CDH Cluster Lab in Google Cloud As a preparation for the CCA Administration certification we need a workable cluster to do our practice tests. I’m going to start showing you how to install a CDH cluster in the google cloud. To start with, we need to proceed with the …