Aggregate dot SQL
The following code uses SQL to set the value of a dataframe called df.
df = spark.sql("""
SELECT *,
LEAD(time,1) OVER(PARTITION BY train_id ORDER BY time) AS time_next
FROM schedule
""")
- The
LEADclause has an equivalent function inpyspark.sql.functions. - The
PARTITION BY, andORDER BYclauses each have an equivalent dot notation function that is called on theWindowobject. - The following imports are available:
- from pyspark.sql import Window
- from pyspark.sql.functions import lead
Diese Übung ist Teil des Kurses
Introduction to Spark SQL in Python
Anleitung zur Übung
- Create a dataframe called
dot_dfthat contains the identical result asdf, using dot notation instead of SQL.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Obtain the identical result using dot notation
dot_df = df.withColumn('time_next', ____('time', 1)
.over(____.____('train_id')
.____('time')))