Creating the TF-IDF DataFrame
Now that you have generated our TF-IDF features, you will need to get them in a format that you can use to make recommendations.
You will once again leverage pandas for this and wrap the array in a DataFrame.
As you will be using the movie titles to do your filtering of the data, you can assign the titles to the DataFrame's index.
The df_plots DataFrame has once again been loaded for you. It contains movies' names in the Title column and their plots in the Plot column.
Este ejercicio forma parte del curso
Building Recommendation Engines in Python
Instrucciones del ejercicio
- Create a
TfidfVectorizerand fit and transform it as you did in the previous exercise. - Wrap the generated
vectorized_datain a DataFrame. Use the names of the features generated during the fit and transform phase as its column names and assign your new DataFrame totfidf_df. - Assign the original movie titles to the index of the newly created
tfidf_dfDataFrame.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
from sklearn.feature_extraction.text import TfidfVectorizer
# Instantiate the vectorizer object and transform the plot column
vectorizer = ____(max_df=0.7, min_df=2)
vectorized_data = vectorizer.____(df_plots['Plot'])
# Create Dataframe from TF-IDFarray
tfidf_df = pd.____(____.toarray(), columns=vectorizer.____())
# Assign the movie titles to the index and inspect
tfidf_df.____ = ____['Title']
print(tfidf_df.head())