Advent of 2023, Day 10 – Creating Job Spark definition

Posted on December 10, 2023 by tomaztsql — 16 Comments

In this Microsoft Fabric series:

Dec 01: What is Microsoft Fabric?
Dec 02: Getting started with Microsoft Fabric
Dec 03: What is lakehouse in Fabric?
Dec 04: Delta lake and delta tables in Microsoft Fabric
Dec 05: Getting data into lakehouse
Dec 06: SQL Analytics endpoint
Dec 07: SQL commands in SQL Analytics endpoint
Dec 08: Using Lakehouse REST API
Dec 09: Building custom environments

An Apache Spark job definition is a single computational action, that is normally scheduled and triggered. In Microsoft Fabric (same as in Synapse), you could submit batch/streaming jobs to Spark clusters.

By uploading a binary file, or libraries in any of the languages (Java / Scala, R, Python), you can run any kind of logic (transformation, cleaning, ingest, ingress, …) to the data that is hosted and server to your lakehouse.

When creating a new Job Spark definition, you will get to the definition screen, where you upload the binary file(s)

My R script is just a toy example of how to read the delta table and append all the records to the same delta table. Important (!) Spark context (or session) must be initialized using the code for the Job definition to be successful (otherwise, the job fails). Still not sure, I understand why the context must be set additionally (??)

library(SparkR)
sparkR.session(master = "", appName = "SparkR", sparkConfig = list())
df_iris <- read.df("abfss://1860beee-xxxxxxxxxx@onelake.dfs.fabric.microsoft.com/a574d1a3-xxxxxxxxx-7128f/Tables/iris_data")
head(df_iris)


#every run we append the whole delta table into itself
write.df(df_iris, 
         source = "delta", 
         path = "abfss://1860beee-xxxxxxxxxx@onelake.dfs.fabric.microsoft.com/a574d1a3-xxxxxxxxx-7128f/Tables/iris_data", 
         mode = "append")

Do not forget to assign Lakehouse workspace to the job definition. Go to Lakehouse Reference and add the preferred Lakehouse.

Once you upload the file, you can schedule the job:

You can always test the job by running it and checking the results:

You can also deep dive into each Job run to get some additional information.

To check if the R code above was successful, I quickly opened a notebook and checked the number of rows (original 150) and we saw there were multiple rows added to the delta table.

Tomorrow we will look the exploring the data science part!

Complete set of code, documents, notebooks, and all of the materials will be available at the Github repository: https://github.com/tomaztk/Microsoft-Fabric

Happy Advent of 2023! 🙂

Tagged with: Azure, Conda, Data engineering, data science, Fabric, Microsoft Fabric, Power BI, Python, R
Posted in Fabric, Power BI, R, Uncategorized

16 comments on “Advent of 2023, Day 10 – Creating Job Spark definition”

Advent of 2023, Day 10 – Creating Job Spark definition – Data Science Austria says:

December 10, 2023 at 9:40 pm

[…] article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]

LikeLike

Reply
Advent of 2023, Day 11 – Starting data science with Microsoft Fabric | TomazTsql says:

December 11, 2023 at 10:15 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day 12 – Creating data science experiments with Microsoft Fabric | TomazTsql says:

December 12, 2023 at 9:12 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day 13 – Creating ML Model with Microsoft Fabric | TomazTsql says:

December 13, 2023 at 5:09 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day 14 – Data warehouse with Microsoft Fabric | TomazTsql says:

December 14, 2023 at 10:28 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day 15 – Building warehouse with Microsoft Fabric | TomazTsql says:

December 15, 2023 at 7:23 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day 16 – Creating data pipelines for Fabric data warehouse | TomazTsql says:

December 16, 2023 at 5:39 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day 17 – Exploring Power BI in Microsoft Fabric | TomazTsql says:

December 17, 2023 at 7:23 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day 18 – Exploring Power BI in Microsoft Fabric | TomazTsql says:

December 18, 2023 at 7:56 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day 19 – Event streaming with Fabric | TomazTsql says:

December 19, 2023 at 10:10 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day20 – Working with notebooks in Fabric | TomazTsql says:

December 20, 2023 at 11:01 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day 21 – Monitoring workspaces with Fabric | TomazTsql says:

December 21, 2023 at 10:36 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day 22 – Apps in Fabric | TomazTsql says:

December 22, 2023 at 7:19 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day 23 – Admin Portal in Fabric | TomazTsql says:

December 23, 2023 at 3:28 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day 24 – OneLake in Fabric | TomazTsql says:

December 24, 2023 at 1:44 pm

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply
Advent of 2023, Day 25 – Learning paths, materials, and documentation on Microsoft Fabric | TomazTsql says:

December 25, 2023 at 10:50 am

[…] Dec 10: Creating Job Spark definition […]

LikeLike

Reply

	laresbernardo on LazyMouse – R package fo…
	laresbernardo on LazyMouse – R package fo…
	Randomly Moving the… on LazyMouse – R package fo…
	Advent of 2025, Day… on Advent of 2025, Day 23 – SQL S…
	Advent of 2025, Day… on Advent of 2025, Day 22 – SQL S…

Advent of 2023, Day 10 – Creating Job Spark definition

Share this:

Related

16 comments on “Advent of 2023, Day 10 – Creating Job Spark definition”

Leave a comment Cancel reply