Useful Databricks/Spark resources

Memory Profiling in PySpark: https://www.databricks.com/blog/2022/11/30/memory-profiling-pyspark.html

Run Databricks queries directly from VSCODE: https://ganeshchandrasekaran.com/run-your-databricks-sql-queries-from-vscode-9c70c5d4903c

Spark Testing with chispa: https://github.com/alexott/spark-playground/tree/master/testing

Best Practices for Cost Management on Databricks: https://www.databricks.com/blog/2022/10/18/best-practices-cost-management-databricks.html

UDF Pyspark: https://docs.databricks.com/udf/python.html

Pandas UDF’s: https://docs.databricks.com/udf/pandas.html

Introducing Pandas UDF for PySpark: https://www.databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html

Leave a Reply

Your email address will not be published. Required fields are marked *