I am trying to add a new column with some values in my dataframe using pandas and have it repeat the same values until it reaches the end of the index: I have tried: <code>df['Fruit Type']=['Bananas','Oranges','Strawberries']</code> it says: <code>ValueError: length of values does not match length of index</code> **My index is about 8000 rows long, so there is a mismatch between index and the number of new column values I want the column to look like: <code>Fruit Type: Bananas Oranges Strawberries Bananas Oranges Strawberries Bananas Oranges Strawberries</code> I found a solution after a while: <code>df.insert(0, 'Fruit Type', ['Bananas', 'Oranges','Strawberries']*int(((len(df))/3)))</code> The 0 stands for column number, followed by column name, then column values. The *int...takes the index divided by 3 and repeats the values for that amount. Thanks to @acai for the multiplier at the end

<h3>Method 1:</h3> Let's say your dataframe were 10 elements long (and you want to repeat your list of 3 fruits). <pre class="prettyprint"><code>>>> df column_a 0 a 1 b 2 c 3 d 4 f 5 e 6 x 7 s 8 n 9 i </code></pre> Using <code>itertools.cycle</code>, you can turn your list into an iterator and cycle through it until the end of the dataframe: <pre class="prettyprint"><code>from itertools import cycle fruits = cycle(['Bananas','Oranges','Strawberries']) df['Fruit_Type'] = [next(fruits) for fruit in range(len(df))] >>> df column_a Fruit_Type 0 a Bananas 1 b Oranges 2 c Strawberries 3 d Bananas 4 f Oranges 5 e Strawberries 6 x Bananas 7 s Oranges 8 n Strawberries 9 i Bananas </code></pre> <h3>Method 2</h3> Here is an ugly hack that you can use as an alternative: You can use <code>pandas.np.tile</code> (which is a wrapper for <code>numpy.tile</code>) to repeat your list however many times is necessary (using the <code>//</code> operator), and then just add the list up to the <code>n</code>th element necessary to fill the dataframe: <pre class="prettyprint"><code>fruits = ['Bananas','Oranges','Strawberries'] df['Fruit Type']= pd.np.tile(fruits, len(df) // len(fruits)).tolist() + fruits[:len(df)%len(fruits)] >>> df column_a Fruit Type 0 a Bananas 1 b Oranges 2 c Strawberries 3 d Bananas 4 f Oranges 5 e Strawberries 6 x Bananas 7 s Oranges 8 n Strawberries 9 i Bananas </code></pre>

You need to repeat the list until the integer fraction allows you to repeat itself. After that the difference of the series that you just had and the length of the dataframe would be the number of elements you need to add to series from the list that you want to repeat. Consider below example where there are 10 data points in the df. <pre class="prettyprint"><code>df = pd.DataFrame({ 'col':range(0,10) }) list_ = ['Bananas','Oranges','Strawberries'] ser = list_ * int(len(df)/len(list_)) df['Fruit Type'] = ser + list_[:len(df)-len(ser)] </code></pre> Output: <pre class="prettyprint"><code> col fruit_type 0 0 Bananas 1 1 Oranges 2 2 Strawberries 3 3 Bananas 4 4 Oranges 5 5 Strawberries 6 6 Bananas 7 7 Oranges 8 8 Strawberries 9 9 Bananas </code></pre>

How to Create a column with repeating values pandas (mismatching indexes)

Tags:

python

pandas

I am trying to add a new column with some values in my dataframe using pandas and have it repeat the same values until it reaches the end of the index:

I have tried:

df['Fruit Type']=['Bananas','Oranges','Strawberries']

it says:

ValueError: length of values does not match length of index

**My index is about 8000 rows long, so there is a mismatch between index and the number of new column values

I want the column to look like:

Fruit Type: Bananas Oranges Strawberries Bananas Oranges Strawberries Bananas Oranges Strawberries

I found a solution after a while:

df.insert(0, 'Fruit Type', ['Bananas', 'Oranges','Strawberries']*int(((len(df))/3)))

The 0 stands for column number, followed by column name, then column values. The *int...takes the index divided by 3 and repeats the values for that amount. Thanks to @acai for the multiplier at the end

430

asked Jun 11 '18 19:06

DasVisual

2 Answers

Method 1:

Let's say your dataframe were 10 elements long (and you want to repeat your list of 3 fruits).

Using itertools.cycle, you can turn your list into an iterator and cycle through it until the end of the dataframe:

from itertools import cycle

fruits = cycle(['Bananas','Oranges','Strawberries'])
df['Fruit_Type'] = [next(fruits) for fruit in range(len(df))]

>>> df
  column_a    Fruit_Type
0        a       Bananas
1        b       Oranges
2        c  Strawberries
3        d       Bananas
4        f       Oranges
5        e  Strawberries
6        x       Bananas
7        s       Oranges
8        n  Strawberries
9        i       Bananas

Method 2

Here is an ugly hack that you can use as an alternative:

You can use pandas.np.tile (which is a wrapper for numpy.tile) to repeat your list however many times is necessary (using the // operator), and then just add the list up to the nth element necessary to fill the dataframe:

fruits = ['Bananas','Oranges','Strawberries']

df['Fruit Type']= pd.np.tile(fruits, len(df) // len(fruits)).tolist() + fruits[:len(df)%len(fruits)]

>>> df
  column_a    Fruit Type
0        a       Bananas
1        b       Oranges
2        c  Strawberries
3        d       Bananas
4        f       Oranges
5        e  Strawberries
6        x       Bananas
7        s       Oranges
8        n  Strawberries
9        i       Bananas

104

answered Nov 03 '22 09:11

sacuL

You need to repeat the list until the integer fraction allows you to repeat itself. After that the difference of the series that you just had and the length of the dataframe would be the number of elements you need to add to series from the list that you want to repeat.

Consider below example where there are 10 data points in the df.

df = pd.DataFrame({
    'col':range(0,10)
})
list_ = ['Bananas','Oranges','Strawberries']
ser = list_ * int(len(df)/len(list_))
df['Fruit Type'] = ser + list_[:len(df)-len(ser)]

Output:

    col fruit_type
0   0   Bananas
1   1   Oranges
2   2   Strawberries
3   3   Bananas
4   4   Oranges
5   5   Strawberries
6   6   Bananas
7   7   Oranges
8   8   Strawberries
9   9   Bananas

answered Nov 03 '22 07:11

harvpan

Related questions
                            
                                pandas read_csv parse header as string type but i want integer
                            
                                Integer in python/pandas becomes BLOB (binary) in sqlite
                            
                                Classification of Images with Recurrent Neural Networks
                            
                                pytest python src layout
                            
                                Django celery beat task not working
                            
                                How to write unicode text to file in python 2 & 3 using same code?
                            
                                TypeError: __class__ assignment only supported for heap types or ModuleType subclasses
                            
                                Efficiently create arrays from a next n elements from an array
                            
                                How to loop in a list more times that list size in python?
                            
                                Pandas Dataframe select multiple discontinuous columns/slices
                            
                                How exactly does Python check through a list?
                            
                                Conditionally passing a named keyword argument to a function [duplicate]
                            
                                Comparing two Python 3 datetime objects returns "can't compare offset-naive and offset-aware datetimes: TypeError"
                            
                                PySpark: create dataframe from random uniform disribution
                            
                                not able to update my package on pypi.org
                            
                                Plotly: How to draw a sankey diagram from a dataframe?
                            
                                Report Keras model evaluation metrics every 10 epochs?
                            
                                Cannot import name 'BlockBlobService'
                            
                                regex for finding file paths
                            
                                urllib.request.Request timeout argument error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With