Python Archives - Albert Nogués

Data Quality Checks with Soda-Core in Databricks

updated on May 31, 2024May 31, 2024

It’s easy to do data quality checks when working with spark with the soda-core library. The library has support for spark dataframes. I’ve tested it within a databricks environment and it worked quite easily for me. For the examples of this article i am loading the customers table from the tpch delta tables in the …

Useful Databricks/Spark resources

updated on December 14, 2022December 14, 2022

Memory Profiling in PySpark: https://www.databricks.com/blog/2022/11/30/memory-profiling-pyspark.html Run Databricks queries directly from VSCODE: https://ganeshchandrasekaran.com/run-your-databricks-sql-queries-from-vscode-9c70c5d4903c Spark Testing with chispa: https://github.com/alexott/spark-playground/tree/master/testing Best Practices for Cost Management on Databricks: https://www.databricks.com/blog/2022/10/18/best-practices-cost-management-databricks.html UDF Pyspark: https://docs.databricks.com/udf/python.html Pandas UDF’s: https://docs.databricks.com/udf/pandas.html Introducing Pandas UDF for PySpark: https://www.databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html

Databricks connectivity to Azure SQL / SQL Server

updated on December 9, 2021December 9, 2021

Most of the developments I see inside databricks rely on fetching or writing data to some sort of Database. Usually the preferred method for this is though the use of jdbc driver, as most databases offer some sort of jdbc driver. In some cases, though, its also possible to use some spark optimized driver. This …

Introduction to the maths of bookmaking (with python code)

updated on December 11, 2020June 27, 2015

Introduction In this article I will show you how to calculate simple things about the odds the bookmakers offer and how to play with them with the intention of using the real chance of each outcome to model a group of prices. Basically what we will do is the following: retrieve the odds of a …