Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way of Creating dummy variables in python

Tags:

python

I want to create a vector of dummy variables(can take only O or 1). I am doing the following:

data = ['one','two','three','four','six']
variables = ['two','five','ten']

I got the following two ways:

dummy=[]
for variable in variables:
    if variable in data:
        dummy.append(1)
    else:
        dummy.append(0)

or with list comprehension:

dummy = [1 if variable in data else 0 for variable in variables]

Results are ok:

>>> [1,0,0]

Is there a build in function doing this task quicker? Its kinda slow if the variables are thousands.

Edit: Results using time.time(): I am using the following data:

data = ['one','two','three','four','six']*100
variables = ['two','five','ten']*100000
  • Loop(from my example): 2.11 sec
  • list comprehension: 1.55 sec
  • list comprehension (variables are type of set): 0.0004992 sec
  • Example from Peter: 0.0004999 sec
  • Example from falsetrue: 0.000502 sec
like image 922
Mpizos Dimitris Avatar asked Jan 06 '23 10:01

Mpizos Dimitris


2 Answers

If you convert data to a set the lookup will be quicker.

You can also convert the boolean to an integer to get 1 or 0 for True or False.

>>> int(True)
1

You can call __contains__ on the set of data for each variable so save creating the set each time through the loop.

You can map all these together:

dummy = list(map(int, map(set(data).__contains__, variables)))

edit:

Much as I like one-liners, I think it's more readable to use a list comprehension.

If you create the set in the list comprehension it will recreate it for each variable. So we need two lines:

search = set(data)
dummy = [int(variable in search) for variable in variables]
like image 76
Peter Wood Avatar answered Jan 08 '23 22:01

Peter Wood


  • Use set - item in set take O(1) / item in list take O(n)
  • You can use int(bool) to get 1 or 0. (instead of conditional expression)

>>> data = ['one','two','three','four','six']
>>> variables = ['two','five','ten']
>>> xs = set(data)
>>> [int(x in xs) for x in variables]
[1, 0, 0]
like image 42
falsetru Avatar answered Jan 09 '23 00:01

falsetru