Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: create new column in df with random integers from range

I have a pandas data frame with 50k rows. I'm trying to add a new column that is a randomly generated integer from 1 to 5.

If I want 50k random numbers I'd use:

df1['randNumCol'] = random.sample(xrange(50000), len(df1)) 

but for this I'm not sure how to do it.

Side note in R, I'd do:

sample(1:5, 50000, replace = TRUE) 

Any suggestions?

like image 727
screechOwl Avatar asked May 19 '15 13:05

screechOwl


People also ask

How create column of random numbers in pandas DataFrame?

Create Pandas Dataframe with Random float valuesUse the np. random. rand() to create a 2D numpy Array filled with random numbers from 0 to 1.

How do I generate random integers within a specific range in Python?

Use randrnage() to generate random integer within a range Use a random. randrange() function to get a random integer number from the given exclusive range by specifying the increment. For example, random. randrange(0, 10, 2) will return any random number between 0 and 20 (like 0, 2, 4, 6, 8).

How do pandas generate random values?

We will be using the numpy. random. randint() method to generate random integers. Example 1 : Generating random integers in Pandas Single Data frame column.


2 Answers

One solution is to use numpy.random.randint:

import numpy as np df1['randNumCol'] = np.random.randint(1, 6, df1.shape[0]) 

Or if the numbers are non-consecutive (albeit slower), you can use this:

df1['randNumCol'] = np.random.choice([1, 9, 20], df1.shape[0]) 

In order to make the results reproducible you can set the seed with numpy.random.seed (e.g. np.random.seed(42))

like image 96
Matt Avatar answered Sep 20 '22 04:09

Matt


To add a column of random integers, use randint(low, high, size). There's no need to waste memory allocating range(low, high); that could be a lot of memory if high is large.

df1['randNumCol'] = np.random.randint(0,5, size=len(df1)) 

Notes:

  • when we're just adding a single column, size is just an integer. In general if we want to generate an array/dataframe of randint()s, size can be a tuple, as in Pandas: How to create a data frame of random integers?)
  • in Python 3.x range(low, high) no longer allocates a list (potentially using lots of memory), it produces a range() object
  • use random.seed(...) for determinism and reproducibility
like image 22
smci Avatar answered Sep 22 '22 04:09

smci