Showing 4 Result(s)

Data Quality Checks with Soda-Core in Databricks

It’s easy to do data quality checks when working with spark with the soda-core library. The library has support for spark dataframes. I’ve tested it within a databricks environment and it worked quite easily for me. For the examples of this article i am loading the customers table from the tpch delta tables in the …

Useful Databricks/Spark resources

Memory Profiling in PySpark: https://www.databricks.com/blog/2022/11/30/memory-profiling-pyspark.html Run Databricks queries directly from VSCODE: https://ganeshchandrasekaran.com/run-your-databricks-sql-queries-from-vscode-9c70c5d4903c Spark Testing with chispa: https://github.com/alexott/spark-playground/tree/master/testing Best Practices for Cost Management on Databricks: https://www.databricks.com/blog/2022/10/18/best-practices-cost-management-databricks.html UDF Pyspark: https://docs.databricks.com/udf/python.html Pandas UDF’s: https://docs.databricks.com/udf/pandas.html Introducing Pandas UDF for PySpark: https://www.databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html

Databricks connectivity to Azure SQL / SQL Server

Most of the developments I see inside databricks rely on fetching or writing data to some sort of Database. Usually the preferred method for this is though the use of jdbc driver, as most databases offer some sort of jdbc driver. In some cases, though, its also possible to use some spark optimized driver. This …