Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shuffle two list at once with same order

I'm using the nltk library's movie_reviews corpus which contains a large number of documents. My task is get predictive performance of these reviews with pre-processing of the data and without pre-processing. But there is problem, in lists documents and documents2 I have the same documents and I need shuffle them in order to keep same order in both lists. I cannot shuffle them separately because each time I shuffle the list, I get other results. That is why I need to shuffle the at once with same order because I need compare them in the end (it depends on order). I'm using python 2.7

Example (in real are strings tokenized, but it is not relative):

documents = [(['plot : two teen couples go to a church party , '], 'neg'),              (['drink and then drive . '], 'pos'),              (['they get into an accident . '], 'neg'),              (['one of the guys dies'], 'neg')]  documents2 = [(['plot two teen couples church party'], 'neg'),               (['drink then drive . '], 'pos'),               (['they get accident . '], 'neg'),               (['one guys dies'], 'neg')] 

And I need get this result after shuffle both lists:

documents = [(['one of the guys dies'], 'neg'),              (['they get into an accident . '], 'neg'),              (['drink and then drive . '], 'pos'),              (['plot : two teen couples go to a church party , '], 'neg')]  documents2 = [(['one guys dies'], 'neg'),               (['they get accident . '], 'neg'),               (['drink then drive . '], 'pos'),               (['plot two teen couples church party'], 'neg')] 

I have this code:

def cleanDoc(doc):     stopset = set(stopwords.words('english'))     stemmer = nltk.PorterStemmer()     clean = [token.lower() for token in doc if token.lower() not in stopset and len(token) > 2]     final = [stemmer.stem(word) for word in clean]     return final  documents = [(list(movie_reviews.words(fileid)), category)              for category in movie_reviews.categories()              for fileid in movie_reviews.fileids(category)]  documents2 = [(list(cleanDoc(movie_reviews.words(fileid))), category)              for category in movie_reviews.categories()              for fileid in movie_reviews.fileids(category)]  random.shuffle( and here shuffle documents and documents2 with same order) # or somehow 
like image 277
Jaroslav Klimčík Avatar asked Apr 25 '14 09:04

Jaroslav Klimčík


People also ask

How do you randomize multiple lists in Python?

The syntax is: random. sample(list,k) where k represents, number of values to be sampled. You can check data science with python course to go through the topic of data science with python.

How do you randomize the order of items in a list?

Python Random shuffle() Method The shuffle() method takes a sequence, like a list, and reorganize the order of the items. Note: This method changes the original list, it does not return a new list.

How do you shuffle data and labels together?

You can try one of the following two approaches to shuffle both data and labels in the same order. Approach 1: Using the number of elements in your data, generate a random index using function permutation(). Use that random index to shuffle the data and labels.

How do I randomize a NumPy array?

You can use numpy. random. shuffle() . This function only shuffles the array along the first axis of a multi-dimensional array.


1 Answers

You can do it as:

import random  a = ['a', 'b', 'c'] b = [1, 2, 3]  c = list(zip(a, b))  random.shuffle(c)  a, b = zip(*c)  print a print b  [OUTPUT] ['a', 'c', 'b'] [1, 3, 2] 

Of course, this was an example with simpler lists, but the adaptation will be the same for your case.

Hope it helps. Good Luck.

like image 88
sshashank124 Avatar answered Sep 19 '22 18:09

sshashank124