Blog

Ansible playbook to configure Azure Red Hat VM’s

In todays post I am going to share an ansible playbook to configure a new VM recently launched. This playbook contains the following: Change Admin password Create linux Group Add user to several groups Create a new user with specific salted password (Check point 3 for generating the hashed salt) Find all the mounted disks …

Databricks and Spark Crash Course. Delta and More!

I’ve been working on a Databricks and Delta tutorial for all of you. I published it as notebook and you can grab it here. We will load some sample data from the NYC taxi dataset available in databricks, load them and store them as table. We will use then python to do some manipulation (Extract …

Unload data from AWS Redshift to S3 in Parquet

Following the previous redshift articles in this one I will explain how to export data from redshift to parquet in s3. This can be interesting when we want to archive (infrequently queried) data to be queried cheaper with spectrum, or to store in s3 archive, or to export to another storage solution like glacier. The …

HowTo run (Any Linux) Centos8 inside windows 10

Since some time ago, windows ships with something called WSL or Windows subsystem for linux. This allows windows users to run linux images withouth having to physically install linux, running cygwin or running vm’s or docker containers with linux. So it’s quite practical. And the good part is that from within the linux shell you …

Automatizing a Bigdata (CDH) cluster installation

It’s been a while without writing any post but this one is going to be useful for people that face the task of installing a CDH cluster from scratch. There are a few prerequisites before starting with the installation that need to be configured or otherwise the installation can/will crash. To address some of these …

Cloudera Manager installation in Google Cloud

Installation of Cloudera Manager and small CDH Cluster Lab in Google Cloud As a preparation for the CCA Administration certification we need a workable cluster to do our practice tests. I’m going to start showing you how to install a CDH cluster in the google cloud. To start with, we need to proceed with the …