I am using Pandas to read a file in this format:
fp = pandas.read_table("Measurements.txt")
fp.head()
"Aaron", 3, 5, 7  
"Aaron", 3, 6, 9  
"Aaron", 3, 6, 10 
"Brave", 4, 6, 0 
"Brave", 3, 6, 1
I want to replace each name with a unique ID so output looks like:
"1", 3, 5, 7 
"1", 3, 6, 9 
"1", 3, 6, 10 
"2", 4, 6, 0 
"2", 3, 6, 1
How can I do that?
Thanks!
You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
You can replace a string in the pandas DataFrame column by using replace(), str. replace() with lambda functions.
The only difference with the method you've highlighted is that df. replace({'\n': '<br>'}, regex=True) returns a new DataFrame object instead of updating the columns on the original DataFrame. So you'll need to reassign the output, e.g. df = df. replace({'\n': '<br>'}, regex=True) .
I would make use of categorical dtype:
In [97]: x['ID'] = x.name.astype('category').cat.rename_categories(range(1, x.name.nunique()+1))
In [98]: x
Out[98]:
    name  v1  v2  v3 ID
0  Aaron   3   5   7  1
1  Aaron   3   6   9  1
2  Aaron   3   6  10  1
3  Brave   4   6   0  2
4  Brave   3   6   1  2
if you need string IDs instead of numerical ones, you can use:
x.name.astype('category').cat.rename_categories([str(x) for x in range(1,x.name.nunique()+1)])
or, as @MedAli has mentioned in his answer, using factorize() method - demo:
In [141]: x['cat'] = pd.Categorical((pd.factorize(x.name)[0] + 1).astype(str))
In [142]: x
Out[142]:
    name  v1  v2  v3 ID cat
0  Aaron   3   5   7  1   1
1  Aaron   3   6   9  1   1
2  Aaron   3   6  10  1   1
3  Brave   4   6   0  2   2
4  Brave   3   6   1  2   2
In [143]: x.dtypes
Out[143]:
name      object
v1         int64
v2         int64
v3         int64
ID      category
cat     category
dtype: object
In [144]: x['cat'].cat.categories
Out[144]: Index(['1', '2'], dtype='object')
or having categories as integer numbers:
In [154]: x['cat'] = pd.Categorical((pd.factorize(x.name)[0] + 1))
In [155]: x
Out[155]:
    name  v1  v2  v3 ID cat
0  Aaron   3   5   7  1   1
1  Aaron   3   6   9  1   1
2  Aaron   3   6  10  1   1
3  Brave   4   6   0  2   2
4  Brave   3   6   1  2   2
In [156]: x['cat'].cat.categories
Out[156]: Int64Index([1, 2], dtype='int64')
explanation:
In [99]: x.name.astype('category')
Out[99]:
0    Aaron
1    Aaron
2    Aaron
3    Brave
4    Brave
Name: name, dtype: category
Categories (2, object): [Aaron, Brave]
In [100]: x.name.astype('category').cat.categories
Out[100]: Index(['Aaron', 'Brave'], dtype='object')
In [101]: x.name.astype('category').cat.rename_categories([1,2])
Out[101]:
0    1
1    1
2    1
3    2
4    2
dtype: category
Categories (2, int64): [1, 2]
explanation for the factorize() method:
In [157]: (pd.factorize(x.name)[0] + 1)
Out[157]: array([1, 1, 1, 2, 2])
In [158]: pd.Categorical((pd.factorize(x.name)[0] + 1))
Out[158]:
[1, 1, 1, 2, 2]
Categories (2, int64): [1, 2]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With