Exploring text vectors, part 2

Using the return_weights() function you wrote in the previous exercise, you're now going to extract the top words from each document in the text vector, return a list of the word indices, and use that list to filter the text vector down to those top words.

Diese Übung ist Teil des Kurses

Preprocessing for Machine Learning in Python

Kurs anzeigen

Anleitung zur Übung

Call return_weights() to return the top weighted words for that document.
Call set() on the returned filter_list to remove duplicated numbers.
Call words_to_filter, passing in the following parameters: vocab for the vocab parameter, tfidf_vec.vocabulary_ for the original_vocab parameter, text_tfidf for the vector parameter, and 3 to grab the top_n 3 weighted words from each document.
Finally, pass that filtered_words set into a list to use as a filter for the text vector.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

def words_to_filter(vocab, original_vocab, vector, top_n):
    filter_list = []
    for i in range(0, vector.shape[0]):
    
        # Call the return_weights function and extend filter_list
        filtered = ____(vocab, original_vocab, vector, i, top_n)
        filter_list.extend(filtered)
        
    # Return the list in a set, so we don't get duplicate word indices
    return ____(filter_list)

# Call the function to get the list of word indices
filtered_words = ____(____, ____, ____, ____)

# Filter the columns in text_tfidf to only those in filtered_words
filtered_text = text_tfidf[:, list(____)]

Code bearbeiten und ausführen