Blog

Databricks and Spark Crash Course. Delta and More!

I’ve been working on a Databricks and Delta tutorial for all of you. I published it as notebook and you can grab it here. We will load some sample data from the NYC taxi dataset available in databricks, load them and store them as table. We will use then python to do some manipulation (Extract …

Unload data from AWS Redshift to S3 in Parquet

Following the previous redshift articles in this one I will explain how to export data from redshift to parquet in s3. This can be interesting when we want to archive (infrequently queried) data to be queried cheaper with spectrum, or to store in s3 archive, or to export to another storage solution like glacier. The …

HowTo run (Any Linux) Centos8 inside windows 10

Since some time ago, windows ships with something called WSL or Windows subsystem for linux. This allows windows users to run linux images withouth having to physically install linux, running cygwin or running vm’s or docker containers with linux. So it’s quite practical. And the good part is that from within the linux shell you …

Automatizing a Bigdata (CDH) cluster installation

It’s been a while without writing any post but this one is going to be useful for people that face the task of installing a CDH cluster from scratch. There are a few prerequisites before starting with the installation that need to be configured or otherwise the installation can/will crash. To address some of these …

Cloudera Manager installation in Google Cloud

Installation of Cloudera Manager and small CDH Cluster Lab in Google Cloud As a preparation for the CCA Administration certification we need a workable cluster to do our practice tests. I’m going to start showing you how to install a CDH cluster in the google cloud. To start with, we need to proceed with the …

Business Intelligence Tools for Small Companies: A Guide to Free and Low-Cost Solutions

Juan Valladares and I have finished our book and it is published now. You can get it from major retailers or from the publishers website (Apress): http://www.apress.com/gp/book/9781484225677 Also available from amazon: https://www.amazon.com/Business-Intelligence-Tools-Small-Companies/dp/1484225678 The book: Teaches how to implement and manage the business intelligence/data warehousing (BI/DWH) infrastructure for a small company Provides practice extracting data from any …