Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Create a column with repeating values pandas (mismatching indexes)

Tags:

python

pandas

I am trying to add a new column with some values in my dataframe using pandas and have it repeat the same values until it reaches the end of the index:

I have tried:

df['Fruit Type']=['Bananas','Oranges','Strawberries']

it says:

ValueError: length of values does not match length of index

**My index is about 8000 rows long, so there is a mismatch between index and the number of new column values

I want the column to look like:

Fruit Type: Bananas Oranges Strawberries Bananas Oranges Strawberries Bananas Oranges Strawberries

I found a solution after a while:

df.insert(0, 'Fruit Type', ['Bananas', 'Oranges','Strawberries']*int(((len(df))/3)))

The 0 stands for column number, followed by column name, then column values. The *int...takes the index divided by 3 and repeats the values for that amount. Thanks to @acai for the multiplier at the end

like image 430
DasVisual Avatar asked Jun 11 '18 19:06

DasVisual


People also ask

How do I repeat a column in pandas?

Pandas str. repeat() method is used to repeat string values in the same position of passed series itself. An array can also be passed in case to define the number of times each element should be repeated in series. For that case, length of array must be same as length of Series.

How replace column values in pandas based on multiple conditions?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do you repeat a panda series?

Pandas Series: repeat() functionThe repeat() function is used to repeat elements of a Series. Returns a new Series where each element of the current Series is repeated consecutively a given number of times. The number of repetitions for each element. This should be a non-negative integer.

How do you find the most repeated value in a column in pandas?

The idxmax() method returns a Series with the index of the maximum value for each column. By specifying the column axis ( axis='columns' ), the idxmax() method returns a Series with the index of the maximum value for each row.


2 Answers

Method 1:

Let's say your dataframe were 10 elements long (and you want to repeat your list of 3 fruits).

>>> df
  column_a
0        a
1        b
2        c
3        d
4        f
5        e
6        x
7        s
8        n
9        i

Using itertools.cycle, you can turn your list into an iterator and cycle through it until the end of the dataframe:

from itertools import cycle

fruits = cycle(['Bananas','Oranges','Strawberries'])
df['Fruit_Type'] = [next(fruits) for fruit in range(len(df))]

>>> df
  column_a    Fruit_Type
0        a       Bananas
1        b       Oranges
2        c  Strawberries
3        d       Bananas
4        f       Oranges
5        e  Strawberries
6        x       Bananas
7        s       Oranges
8        n  Strawberries
9        i       Bananas

Method 2

Here is an ugly hack that you can use as an alternative:

You can use pandas.np.tile (which is a wrapper for numpy.tile) to repeat your list however many times is necessary (using the // operator), and then just add the list up to the nth element necessary to fill the dataframe:

fruits = ['Bananas','Oranges','Strawberries']

df['Fruit Type']= pd.np.tile(fruits, len(df) // len(fruits)).tolist() + fruits[:len(df)%len(fruits)]

>>> df
  column_a    Fruit Type
0        a       Bananas
1        b       Oranges
2        c  Strawberries
3        d       Bananas
4        f       Oranges
5        e  Strawberries
6        x       Bananas
7        s       Oranges
8        n  Strawberries
9        i       Bananas
like image 104
sacuL Avatar answered Nov 03 '22 09:11

sacuL


You need to repeat the list until the integer fraction allows you to repeat itself. After that the difference of the series that you just had and the length of the dataframe would be the number of elements you need to add to series from the list that you want to repeat.

Consider below example where there are 10 data points in the df.

df = pd.DataFrame({
    'col':range(0,10)
})
list_ = ['Bananas','Oranges','Strawberries']
ser = list_ * int(len(df)/len(list_))
df['Fruit Type'] = ser + list_[:len(df)-len(ser)]

Output:

    col fruit_type
0   0   Bananas
1   1   Oranges
2   2   Strawberries
3   3   Bananas
4   4   Oranges
5   5   Strawberries
6   6   Bananas
7   7   Oranges
8   8   Strawberries
9   9   Bananas
like image 22
harvpan Avatar answered Nov 03 '22 07:11

harvpan