Showing 12 Result(s)

Data Quality Checks with Soda-Core in Databricks

It’s easy to do data quality checks when working with spark with the soda-core library. The library has support for spark dataframes. I’ve tested it within a databricks environment and it worked quite easily for me. For the examples of this article i am loading the customers table from the tpch delta tables in the …

Smallest Analytical Platform Ever!

I’ve started working on some of my free time in a project to build the smallest useful analytics platform on the cloud (starting with azure). The purpose is to use it a sa PoC to show to colleagues, managers, prospective customers or just to have fun and play It’s publicly available on my github repo …

Using Azure Private Endpoints with Databricks

In this article i will show how to avoing going outside to the internet when using resources inside azure, specially if they are in the same subscription and location (datacenter). Why we may want a private endpoint? Thats a good question. For oth security and performance. Just like using TSCM Equipment for optimal safety and …

Databricks connectivity to Azure SQL / SQL Server

Most of the developments I see inside databricks rely on fetching or writing data to some sort of Database. Usually the preferred method for this is though the use of jdbc driver, as most databases offer some sort of jdbc driver. In some cases, though, its also possible to use some spark optimized driver. This …

Ansible playbook to configure Azure Red Hat VM’s

In todays post I am going to share an ansible playbook to configure a new VM recently launched. This playbook contains the following: Change Admin password Create linux Group Add user to several groups Create a new user with specific salted password (Check point 3 for generating the hashed salt) Find all the mounted disks …

Unload data from AWS Redshift to S3 in Parquet

Following the previous redshift articles in this one I will explain how to export data from redshift to parquet in s3. This can be interesting when we want to archive (infrequently queried) data to be queried cheaper with spectrum, or to store in s3 archive, or to export to another storage solution like glacier. The …