Showing 6 Result(s)

Data Quality Checks with Soda-Core in Databricks

It’s easy to do data quality checks when working with spark with the soda-core library. The library has support for spark dataframes. I’ve tested it within a databricks environment and it worked quite easily for me. For the examples of this article i am loading the customers table from the tpch delta tables in the …

Databricks query federation with Snowflake. Easy and Fast!

Introduction In the same way that is possible to read and write data from snowflake inside databricks, its also possible to use databricks with query federation against diverse SQL engines, including snowflake. The current supported engines are: We are going to demonstrate how it works with Snowflake. We will first create a table in databricks, …

Smallest Analytical Platform Ever!

I’ve started working on some of my free time in a project to build the smallest useful analytics platform on the cloud (starting with azure). The purpose is to use it a sa PoC to show to colleagues, managers, prospective customers or just to have fun and play It’s publicly available on my github repo …

Ansible playbook to configure Azure Red Hat VM’s

In todays post I am going to share an ansible playbook to configure a new VM recently launched. This playbook contains the following: Change Admin password Create linux Group Add user to several groups Create a new user with specific salted password (Check point 3 for generating the hashed salt) Find all the mounted disks …

Databricks and Spark Crash Course. Delta and More!

I’ve been working on a Databricks and Delta tutorial for all of you. I published it as notebook and you can grab it here. We will load some sample data from the NYC taxi dataset available in databricks, load them and store them as table. We will use then python to do some manipulation (Extract …