I want to create a vector of dummy variables(can take only O or 1). I am doing the following:
data = ['one','two','three','four','six']
variables = ['two','five','ten']
I got the following two ways:
dummy=[]
for variable in variables:
if variable in data:
dummy.append(1)
else:
dummy.append(0)
or with list comprehension:
dummy = [1 if variable in data else 0 for variable in variables]
Results are ok:
>>> [1,0,0]
Is there a build in function doing this task quicker? Its kinda slow if the variables are thousands.
Edit: Results using time.time()
:
I am using the following data:
data = ['one','two','three','four','six']*100
variables = ['two','five','ten']*100000
If you convert data
to a set
the lookup will be quicker.
You can also convert the boolean to an integer to get 1
or 0
for True
or False
.
>>> int(True)
1
You can call __contains__
on the set of data for each variable so save creating the set each time through the loop.
You can map all these together:
dummy = list(map(int, map(set(data).__contains__, variables)))
edit:
Much as I like one-liners, I think it's more readable to use a list comprehension.
If you create the set
in the list comprehension it will recreate it for each variable
. So we need two lines:
search = set(data)
dummy = [int(variable in search) for variable in variables]
set
- item in set
take O(1) / item in list
take O(n)>>> data = ['one','two','three','four','six']
>>> variables = ['two','five','ten']
>>> xs = set(data)
>>> [int(x in xs) for x in variables]
[1, 0, 0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With