I'm using the <code>nltk</code> library's <code>movie_reviews</code> corpus which contains a large number of documents. My task is get predictive performance of these reviews with pre-processing of the data and without pre-processing. But there is problem, in lists <code>documents</code> and <code>documents2</code> I have the same documents and I need shuffle them in order to keep same order in both lists. I cannot shuffle them separately because each time I shuffle the list, I get other results. That is why I need to shuffle the at once with same order because I need compare them in the end (it depends on order). I'm using python 2.7 Example (in real are strings tokenized, but it is not relative): <pre class="prettyprint"><code>documents = [(['plot : two teen couples go to a church party , '], 'neg'), (['drink and then drive . '], 'pos'), (['they get into an accident . '], 'neg'), (['one of the guys dies'], 'neg')] documents2 = [(['plot two teen couples church party'], 'neg'), (['drink then drive . '], 'pos'), (['they get accident . '], 'neg'), (['one guys dies'], 'neg')] </code></pre> And I need get this result after shuffle both lists: <pre class="prettyprint"><code>documents = [(['one of the guys dies'], 'neg'), (['they get into an accident . '], 'neg'), (['drink and then drive . '], 'pos'), (['plot : two teen couples go to a church party , '], 'neg')] documents2 = [(['one guys dies'], 'neg'), (['they get accident . '], 'neg'), (['drink then drive . '], 'pos'), (['plot two teen couples church party'], 'neg')] </code></pre> I have this code: <pre class="prettyprint"><code>def cleanDoc(doc): stopset = set(stopwords.words('english')) stemmer = nltk.PorterStemmer() clean = [token.lower() for token in doc if token.lower() not in stopset and len(token) > 2] final = [stemmer.stem(word) for word in clean] return final documents = [(list(movie_reviews.words(fileid)), category) for category in movie_reviews.categories() for fileid in movie_reviews.fileids(category)] documents2 = [(list(cleanDoc(movie_reviews.words(fileid))), category) for category in movie_reviews.categories() for fileid in movie_reviews.fileids(category)] random.shuffle( and here shuffle documents and documents2 with same order) # or somehow </code></pre>

You can do it as: <pre class="prettyprint"><code>import random a = ['a', 'b', 'c'] b = [1, 2, 3] c = list(zip(a, b)) random.shuffle(c) a, b = zip(*c) print a print b [OUTPUT] ['a', 'c', 'b'] [1, 3, 2] </code></pre> Of course, this was an example with simpler lists, but the adaptation will be the same for your case. Hope it helps. Good Luck.

Shuffle two list at once with same order

Tags:

python

list

sorting

shuffle

I'm using the nltk library's movie_reviews corpus which contains a large number of documents. My task is get predictive performance of these reviews with pre-processing of the data and without pre-processing. But there is problem, in lists documents and documents2 I have the same documents and I need shuffle them in order to keep same order in both lists. I cannot shuffle them separately because each time I shuffle the list, I get other results. That is why I need to shuffle the at once with same order because I need compare them in the end (it depends on order). I'm using python 2.7

Example (in real are strings tokenized, but it is not relative):

documents = [(['plot : two teen couples go to a church party , '], 'neg'),              (['drink and then drive . '], 'pos'),              (['they get into an accident . '], 'neg'),              (['one of the guys dies'], 'neg')]  documents2 = [(['plot two teen couples church party'], 'neg'),               (['drink then drive . '], 'pos'),               (['they get accident . '], 'neg'),               (['one guys dies'], 'neg')]

And I need get this result after shuffle both lists:

documents = [(['one of the guys dies'], 'neg'),              (['they get into an accident . '], 'neg'),              (['drink and then drive . '], 'pos'),              (['plot : two teen couples go to a church party , '], 'neg')]  documents2 = [(['one guys dies'], 'neg'),               (['they get accident . '], 'neg'),               (['drink then drive . '], 'pos'),               (['plot two teen couples church party'], 'neg')]

I have this code:

def cleanDoc(doc):     stopset = set(stopwords.words('english'))     stemmer = nltk.PorterStemmer()     clean = [token.lower() for token in doc if token.lower() not in stopset and len(token) > 2]     final = [stemmer.stem(word) for word in clean]     return final  documents = [(list(movie_reviews.words(fileid)), category)              for category in movie_reviews.categories()              for fileid in movie_reviews.fileids(category)]  documents2 = [(list(cleanDoc(movie_reviews.words(fileid))), category)              for category in movie_reviews.categories()              for fileid in movie_reviews.fileids(category)]  random.shuffle( and here shuffle documents and documents2 with same order) # or somehow

277

asked Apr 25 '14 09:04

Jaroslav Klimčík

1 Answers

You can do it as:

import random  a = ['a', 'b', 'c'] b = [1, 2, 3]  c = list(zip(a, b))  random.shuffle(c)  a, b = zip(*c)  print a print b  [OUTPUT] ['a', 'c', 'b'] [1, 3, 2]

Of course, this was an example with simpler lists, but the adaptation will be the same for your case.

Hope it helps. Good Luck.

answered Sep 19 '22 18:09

sshashank124

Related questions
                            
                                Is there a WebSocket client implemented for Python? [closed]
                            
                                How can I add items to an empty set in python
                            
                                Disable all Pylint warnings for a file
                            
                                What is the purpose of python's inner classes?
                            
                                many-to-many in list display django
                            
                                Working with TIFFs (import, export) in Python using numpy
                            
                                Custom Python list sorting
                            
                                List to array conversion to use ravel() function
                            
                                URL-parameters and logic in Django class-based views (TemplateView)
                            
                                Request UAC elevation from within a Python script?
                            
                                Python Django Rest Framework UnorderedObjectListWarning
                            
                                Pandas: filling missing values by mean in each group
                            
                                How to send requests with JSON in unit tests
                            
                                Stratified Train/Test-split in scikit-learn
                            
                                Pycharm/Python OpenCV and CV2 install error
                            
                                How to upgrade pip3?
                            
                                How to apply gradient clipping in TensorFlow?
                            
                                Can Keras with Tensorflow backend be forced to use CPU or GPU at will?
                            
                                How to redirect output with subprocess in Python?
                            
                                Debugging (displaying) SQL command sent to the db by SQLAlchemy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With