Exploring text vectors, part 2
Using the return_weights() function you wrote in the previous exercise, you're now going to extract the top words from each document in the text vector, return a list of the word indices, and use that list to filter the text vector down to those top words.
Diese Übung ist Teil des Kurses
Preprocessing for Machine Learning in Python
Anleitung zur Übung
- Call
return_weights()to return the top weighted words for that document. - Call
set()on the returnedfilter_listto remove duplicated numbers. - Call
words_to_filter, passing in the following parameters:vocabfor thevocabparameter,tfidf_vec.vocabulary_for theoriginal_vocabparameter,text_tfidffor thevectorparameter, and3to grab thetop_n3 weighted words from each document. - Finally, pass that
filtered_wordsset into a list to use as a filter for the text vector.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
def words_to_filter(vocab, original_vocab, vector, top_n):
filter_list = []
for i in range(0, vector.shape[0]):
# Call the return_weights function and extend filter_list
filtered = ____(vocab, original_vocab, vector, i, top_n)
filter_list.extend(filtered)
# Return the list in a set, so we don't get duplicate word indices
return ____(filter_list)
# Call the function to get the list of word indices
filtered_words = ____(____, ____, ____, ____)
# Filter the columns in text_tfidf to only those in filtered_words
filtered_text = text_tfidf[:, list(____)]