Advent of 2023, Day 10 – Creating Job Spark definition

In this Microsoft Fabric series:

  1. Dec 01: What is Microsoft Fabric?
  2. Dec 02: Getting started with Microsoft Fabric
  3. Dec 03: What is lakehouse in Fabric?
  4. Dec 04: Delta lake and delta tables in Microsoft Fabric
  5. Dec 05: Getting data into lakehouse
  6. Dec 06: SQL Analytics endpoint
  7. Dec 07: SQL commands in SQL Analytics endpoint
  8. Dec 08: Using Lakehouse REST API
  9. Dec 09: Building custom environments

An Apache Spark job definition is a single computational action, that is normally scheduled and triggered. In Microsoft Fabric (same as in Synapse), you could submit batch/streaming jobs to Spark clusters.

By uploading a binary file, or libraries in any of the languages (Java / Scala, R, Python), you can run any kind of logic (transformation, cleaning, ingest, ingress, …) to the data that is hosted and server to your lakehouse.

When creating a new Job Spark definition, you will get to the definition screen, where you upload the binary file(s)

My R script is just a toy example of how to read the delta table and append all the records to the same delta table. Important (!) Spark context (or session) must be initialized using the code for the Job definition to be successful (otherwise, the job fails). Still not sure, I understand why the context must be set additionally (??)

library(SparkR)
sparkR.session(master = "", appName = "SparkR", sparkConfig = list())
df_iris <- read.df("abfss://1860beee-xxxxxxxxxx@onelake.dfs.fabric.microsoft.com/a574d1a3-xxxxxxxxx-7128f/Tables/iris_data")
head(df_iris)


#every run we append the whole delta table into itself
write.df(df_iris, 
         source = "delta", 
         path = "abfss://1860beee-xxxxxxxxxx@onelake.dfs.fabric.microsoft.com/a574d1a3-xxxxxxxxx-7128f/Tables/iris_data", 
         mode = "append")

Do not forget to assign Lakehouse workspace to the job definition. Go to Lakehouse Reference and add the preferred Lakehouse.

Once you upload the file, you can schedule the job:

You can always test the job by running it and checking the results:

You can also deep dive into each Job run to get some additional information.

To check if the R code above was successful, I quickly opened a notebook and checked the number of rows (original 150) and we saw there were multiple rows added to the delta table.

Tomorrow we will look the exploring the data science part!

Complete set of code, documents, notebooks, and all of the materials will be available at the Github repository: https://github.com/tomaztk/Microsoft-Fabric

Happy Advent of 2023! 🙂

Tagged with: , , , , , , , ,
Posted in Fabric, Power BI, R, Uncategorized
16 comments on “Advent of 2023, Day 10 – Creating Job Spark definition
  1. […] article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]

    Like

  2. […] Dec 10: Creating Job Spark definition […]

    Like

  3. […] Dec 10: Creating Job Spark definition […]

    Like

  4. […] Dec 10: Creating Job Spark definition […]

    Like

Leave a comment

Follow TomazTsql on WordPress.com
Programs I Use: SQL Search
Programs I Use: R Studio
Programs I Use: Plan Explorer
Rdeči Noski – Charity

Rdeči noski

100% of donations made here go to charity, no deductions, no fees. For CLOWNDOCTORS - encouraging more joy and happiness to children staying in hospitals (http://www.rednoses.eu/red-noses-organisations/slovenia/)

€2.00

Top SQL Server Bloggers 2018
TomazTsql

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Discover WordPress

A daily selection of the best content published on WordPress, collected for you by humans who love to read.

Revolutions

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Reeves Smith's SQL & BI Blog

A blog about SQL Server and the Microsoft Business Intelligence stack with some random Non-Microsoft tools thrown in for good measure.

SQL Server

for Application Developers

Business Analytics 3.0

Data Driven Business Models

SQL Database Engine Blog

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Search Msdn

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

R-bloggers

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Data Until I Die!

Data for Life :)

Paul Turley's SQL Server BI Blog

sharing my experiences with the Microsoft data platform, SQL Server BI, Data Modeling, SSAS Design, Power Pivot, Power BI, SSRS Advanced Design, Power BI, Dashboards & Visualization since 2009

Grant Fritchey

Intimidating Databases and Code

Madhivanan's SQL blog

A modern business theme

Alessandro Alpi's Blog

DevOps could be the disease you die with, but don’t die of.

Paul te Braak

Business Intelligence Blog

Sql Insane Asylum (A Blog by Pat Wright)

Information about SQL (PostgreSQL & SQL Server) from the Asylum.

Gareth's Blog

A blog about Life, SQL & Everything ...

SQLPam's Blog

Life changes fast and this is where I occasionally take time to ponder what I have learned and experienced. A lot of focus will be on SQL and the SQL community – but life varies.

William Durkin

William Durkin a blog on SQL Server, Replication, Performance Tuning and whatever else.

$hell Your Experience !!!

As aventuras de um DBA usando o Poder do $hell

Design a site like this with WordPress.com
Get started