Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add uuid to a new column in a pandas DataFrame

I'm looking to add a uuid for every row in a single new column in a pandas DataFrame. This obviously fills the column with the same uuid:

import uuid
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(4,3), columns=list('abc'),
                  index=['apple', 'banana', 'cherry', 'date'])
df['uuid'] = uuid.uuid4()
print(df)

               a         b         c                                  uuid
apple   0.687601 -1.332904 -0.166018  34115445-c4b8-4e64-bc96-e120abda1653
banana -2.252191 -0.844470  0.384140  34115445-c4b8-4e64-bc96-e120abda1653
cherry -0.470388  0.642342  0.692454  34115445-c4b8-4e64-bc96-e120abda1653
date   -0.943255  1.450051 -0.296499  34115445-c4b8-4e64-bc96-e120abda1653

What I am looking for is a new uuid in each row of the 'uuid' column. I have also tried using .apply() and .map() without success.

like image 645
TankofVines Avatar asked Feb 17 '18 01:02

TankofVines


People also ask

How do I append to a column in pandas?

In pandas you can add/append a new column to the existing DataFrame using DataFrame. insert() method, this method updates the existing DataFrame with a new column. DataFrame. assign() is also used to insert a new column however, this method returns a new Dataframe after adding a new column.

How do I add a element to a DataFrame pandas?

append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value. Parameters: other : DataFrame or Series/dict-like object, or list of these.

How do I add a value to an existing column in a DataFrame?

loc: loc is an integer which is the location of column where we want to insert new column. This will shift the existing column at that position to the right. column: column is a string which is name of column to be inserted. value: value is simply the value to be inserted.

How do you append to a DataFrame?

The syntax for using append on a Series is very similar to the dataframe syntax. You type the name of the first Series, and then . append() to call the method. Then inside the parenthesis, you type the name of the second Series, which you want to append to the end of the first.


4 Answers

This is one way:

df['uuid'] = [uuid.uuid4() for _ in range(len(df.index))]
like image 59
jpp Avatar answered Oct 19 '22 09:10

jpp


I can't speak to computational efficiency here, but I prefer the syntax here, as it's consistent with the other apply-lambda modifications I usually use to generate new columns:

df['uuid'] = df.apply(lambda _: uuid.uuid4(), axis=1)

You can also pick a random column to remove the axis requirement (why axis=0 is the default, I'll never understand):

df['uuid'] = df['col'].apply(lambda _: uuid.uuid4())

The downside to these is technically you're passing in a variable (_) that you don't actually use. It would be mildly nice to have the capability to do something like lambda: uuid.uuid4(), but apply doesn't support lambas with no args, which is reasonable given its use case would be rather limited.

like image 26
Brendan Avatar answered Oct 19 '22 08:10

Brendan


from uuid import uuid4
df['uuid'] = df.index.to_series().map(lambda x: uuid4())
like image 44
S. A. Calder Avatar answered Oct 19 '22 09:10

S. A. Calder


To create a new column, you must have enough values to fill the column. If we know the number of rows (by calculating the len of the dataframe), we can create a set of values that can then be applied to a column.

import uuid
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(4,3), columns=list('abc'),
                  index=['apple', 'banana', 'cherry', 'date'])


# you can create a simple list of values using a list comprehension 
#     based on the len (or number of rows) of the dataframe
df['uuid'] = [uuid.uuid4() for x in range(len(df))]
print(df)

apple  -0.775699 -1.104219  1.144653  f98a9c76-99b7-4ba7-9c0a-9121cdf8ad7f
banana -1.540495 -0.945760  0.649370  179819a0-3d0f-43f8-8645-da9229ef3fc3
cherry -0.340872  2.445467 -1.071793  b48a9830-3a10-4ce0-bca0-0cc136f09732
date   -1.286273  0.244233  0.626831  e7b7c65c-0adc-4ba6-88ab-2160e9858fc4
like image 2
E. Ducateme Avatar answered Oct 19 '22 07:10

E. Ducateme