Posts

Showing posts from October, 2020

NLP in action: train a Naive Bayes model on movie reviews

Image
In the book Natural Language Processing in Action , section 2.3.2 Naive Bayes , we train a multinomial Naive Bayes classifier on movie reviews using scikit-learn 's  MultinomialNB .   NOTE:  I am using numpy 1.19.1, pandas 1.1.3, and scikit-learn 0.23.2 However, we get a ValueError when transforming the predicted probabilities in the [-4, 4] range as done in the book: nb = MultinomialNB () nb = nb . fit ( df_bows , movies . sentiment > 0 ) movies [ 'predicted_sentiment' ] = nb . predict_proba ( df_bows ) * 8 - 4 ValueError: Wrong number of items passed 2, placement implies 1 The reason is that nb.predict_proba() returns a numpy array with two columns and we are trying to assign it to a single column from the Pandas table  movies  (which I believe you could do in previous Pandas versions; I am using Pandas version 1.1.3) : array ([[ 1.86060657e-01 , 8.13939343e-01 ], [ 1.19745717e-05 , 9.99988025e-01 ], [ 9.569