Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

behavior of len() with arange()

Tags:

python

pandas

With this dataframe, dff:

   A  B
0  0  a
1  1  a
2  2  b
3  3  b
4  4  b
5  5  b
6  6  c
7  7  c

I understand how len(dff) == 8

However, I don't understand the answer from:

dff['counts'] = np.arange(len(dff)) 

which is

     A  B  counts  
  0  0  a       0  
  1  1  a       1  
  2  2  b       2  
  3  3  b       3  
  4  4  b       4  
  5  5  b       5  
  6  6  c       6  
  7  7  c       7  

Shouldn't dff['counts'] be 8 for every row? What is going on under the hood?

like image 773
Derek Krantz Avatar asked Jan 22 '26 08:01

Derek Krantz


1 Answers

You seem to misunderstand what np.arange does:

In [32]:

np.arange(8) 
Out[32]:
array([0, 1, 2, 3, 4, 5, 6, 7])

Here the length of your df is being used to set the stop param:

From the docs:

numpy.arange([start, ]stop, [step, ]dtype=None)
Return evenly spaced values within a given interval.

Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). For integer arguments the function is equivalent to the Python built-in range function, but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use linspace for these cases.

Parameters: 
start : number, optional
Start of interval. The interval includes this value. The default start value is 0.
stop : number
End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.
step : number, optional
Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified, start must also be given.
dtype : dtype
The type of the output array. If dtype is not given, infer the data type from the other input arguments.
Returns:    
arange : ndarray
Array of evenly spaced values.
For floating point arguments, the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of out being greater than stop.

if you wanted to set every row to the same value you could do just

In [34]:

dff['counts'] = len(dff)
dff
Out[34]:
   A  B  counts
0  0  a       8
1  1  a       8
2  2  b       8
3  3  b       8
4  4  b       8
5  5  b       8
6  6  c       8
7  7  c       8
like image 50
EdChum Avatar answered Jan 23 '26 23:01

EdChum



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!