Split and explode a text column
A dataframe clauses_df with 100 rows is provided. It has a column clause and a row id. Each clause is a string containing one or more words separated by spaces.
Diese Übung ist Teil des Kurses
Introduction to Spark SQL in Python
Anleitung zur Übung
- Split the
clausecolumn into a column calledwords, containing an array of individual words. - Explode the
wordscolumn into a column calledword. - Count the resulting number of rows.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Split the clause column into a column called words
split_df = clauses_df.select(____('clause', ' ').____('words'))
split_df.show(5, truncate=False)
# Explode the words column into a column called word
exploded_df = split_df.____(____('____').____('word'))
exploded_df.show(10)
# Count the resulting number of rows in exploded_df
print("\nNumber of rows: ", ____)