I have a pandas.core.series.Series with data <pre class="prettyprint"><code>0 [00115840, 00110005, 001000033, 00116000... 1 [00267285, 00263627, 00267010, 0026513... 2 [00335595, 00350750] </code></pre> I want to remove leading zeros from the series.I tried <pre class="prettyprint"><code>x.astype('int64') </code></pre> But got error message <pre class="prettyprint"><code>ValueError: setting an array element with a sequence. </code></pre> Can you suggest me how to do this in python 3.x?

<pre class="prettyprint"><code>s=pd.Series(s.apply(pd.Series).astype(int).values.tolist()) s Out[282]: 0 [1, 2] 1 [3, 4] dtype: object </code></pre> Data input <pre class="prettyprint"><code>s=pd.Series([['001','002'],['003','004']]) </code></pre> Update: Thanks for Jez and cold point it out :-) <pre class="prettyprint"><code>pd.Series(s.apply(pd.Series).stack().astype(int).groupby(level=0).apply(list)) Out[317]: 0 [115840, 110005, 1000033, 116000] 1 [267285, 263627, 267010, 26513] 2 [335595, 350750] dtype: object </code></pre>

If want list of <code>string</code>s convert to list of <code>integers</code>s use <code>list comprehension</code>: <pre class="prettyprint"><code>s = pd.Series([[int(y) for y in x] for x in s], index=s.index) </code></pre> <pre class="prettyprint"><code>s = s.apply(lambda x: [int(y) for y in x]) </code></pre> Sample: <pre class="prettyprint"><code>a = [['00115840', '00110005', '001000033', '00116000'], ['00267285', '00263627', '00267010', '0026513'], ['00335595', '00350750']] s = pd.Series(a) print (s) 0 [00115840, 00110005, 001000033, 00116000] 1 [00267285, 00263627, 00267010, 0026513] 2 [00335595, 00350750] dtype: object s = s.apply(lambda x: [int(y) for y in x]) print (s) 0 [115840, 110005, 1000033, 116000] 1 [267285, 263627, 267010, 26513] 2 [335595, 350750] dtype: object </code></pre> EDIT: If want <code>integer</code>s only you can flatten values and cast to <code>int</code>s: <pre class="prettyprint"><code>s = pd.Series([item for sublist in s for item in sublist]).astype(int) </code></pre> Alternative solution: <pre class="prettyprint"><code>import itertools s = pd.Series(list(itertools.chain(*s))).astype(int) print (s) 0 115840 1 110005 2 1000033 3 116000 4 267285 5 263627 6 267010 7 26513 8 335595 9 350750 dtype: int32 </code></pre> Timings: <pre class="prettyprint"><code>a = [['00115840', '00110005', '001000033', '00116000'], ['00267285', '00263627', '00267010', '0026513'], ['00335595', '00350750']] s = pd.Series(a) s = pd.concat([s]*1000).reset_index(drop=True) </code></pre> <pre class="prettyprint"><code>In [203]: %timeit pd.Series([[int(y) for y in x] for x in s], index=s.index) 100 loops, best of 3: 4.66 ms per loop In [204]: %timeit s.apply(lambda x: [int(y) for y in x]) 100 loops, best of 3: 5.13 ms per loop #cᴏʟᴅsᴘᴇᴇᴅ sol In [205]: %%timeit ...: v = pd.Series(np.concatenate(s.values.tolist())) ...: v.astype(int).groupby(s.index.repeat(s.str.len())).agg(pd.Series.tolist) ...: 1 loop, best of 3: 226 ms per loop #Wen solution In [211]: %timeit pd.Series(s.apply(pd.Series).stack().astype(int).groupby(level=0).apply(list)) 1 loop, best of 3: 1.12 s per loop </code></pre> Solutions with flatenning (idea of @cᴏʟᴅsᴘᴇᴇᴅ): <pre class="prettyprint"><code>In [208]: %timeit pd.Series([item for sublist in s for item in sublist]).astype(int) 100 loops, best of 3: 2.55 ms per loop In [209]: %timeit pd.Series(list(itertools.chain(*s))).astype(int) 100 loops, best of 3: 2.2 ms per loop #cᴏʟᴅsᴘᴇᴇᴅ sol In [210]: %timeit pd.Series(np.concatenate(s.values.tolist())) 100 loops, best of 3: 7.71 ms per loop </code></pre>

Removing leading zeros from pandas.core.series.Series

0    [00115840, 00110005, 001000033, 00116000...
1    [00267285, 00263627, 00267010, 0026513...
2                             [00335595, 00350750]

I want to remove leading zeros from the series.I tried

x.astype('int64')

But got error message

ValueError: setting an array element with a sequence.

Can you suggest me how to do this in python 3.x?

516

asked Jan 07 '18 16:01

Elina

2 Answers

s=pd.Series(s.apply(pd.Series).astype(int).values.tolist())
s
Out[282]: 
0    [1, 2]
1    [3, 4]
dtype: object

Data input

s=pd.Series([['001','002'],['003','004']])

Update: Thanks for Jez and cold point it out :-)

pd.Series(s.apply(pd.Series).stack().astype(int).groupby(level=0).apply(list))
Out[317]: 
0    [115840, 110005, 1000033, 116000]
1      [267285, 263627, 267010, 26513]
2                     [335595, 350750]
dtype: object

143

answered Oct 18 '22 07:10

BENY

If want list of strings convert to list of integerss use list comprehension:

s = pd.Series([[int(y) for y in x] for x in s], index=s.index)

s = s.apply(lambda x: [int(y) for y in x])

Sample:

a = [['00115840', '00110005', '001000033', '00116000'],
     ['00267285', '00263627', '00267010', '0026513'],
     ['00335595', '00350750']]

s = pd.Series(a)
print (s)
0    [00115840, 00110005, 001000033, 00116000]
1      [00267285, 00263627, 00267010, 0026513]
2                         [00335595, 00350750]
dtype: object

s = s.apply(lambda x: [int(y) for y in x])
print (s)
0    [115840, 110005, 1000033, 116000]
1      [267285, 263627, 267010, 26513]
2                     [335595, 350750]
dtype: object

EDIT:

If want integers only you can flatten values and cast to ints:

s = pd.Series([item for sublist in s for item in sublist]).astype(int)

Alternative solution:

import itertools
s = pd.Series(list(itertools.chain(*s))).astype(int)

print (s)
0     115840
1     110005
2    1000033
3     116000
4     267285
5     263627
6     267010
7      26513
8     335595
9     350750
dtype: int32

Timings:

a = [['00115840', '00110005', '001000033', '00116000'],
     ['00267285', '00263627', '00267010', '0026513'],
     ['00335595', '00350750']]

s = pd.Series(a)
s = pd.concat([s]*1000).reset_index(drop=True)

In [203]: %timeit pd.Series([[int(y) for y in x] for x in s], index=s.index)
100 loops, best of 3: 4.66 ms per loop

In [204]: %timeit s.apply(lambda x: [int(y) for y in x])
100 loops, best of 3: 5.13 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ sol
In [205]: %%timeit
     ...: v = pd.Series(np.concatenate(s.values.tolist()))
     ...: v.astype(int).groupby(s.index.repeat(s.str.len())).agg(pd.Series.tolist)
     ...: 
1 loop, best of 3: 226 ms per loop

#Wen solution
In [211]: %timeit pd.Series(s.apply(pd.Series).stack().astype(int).groupby(level=0).apply(list))
1 loop, best of 3: 1.12 s per loop

Solutions with flatenning (idea of @cᴏʟᴅsᴘᴇᴇᴅ):

In [208]: %timeit pd.Series([item for sublist in s for item in sublist]).astype(int)
100 loops, best of 3: 2.55 ms per loop

In [209]: %timeit pd.Series(list(itertools.chain(*s))).astype(int)
100 loops, best of 3: 2.2 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ sol
In [210]: %timeit pd.Series(np.concatenate(s.values.tolist()))
100 loops, best of 3: 7.71 ms per loop

answered Oct 18 '22 07:10

jezrael

Related questions
                            
                                Unable to store the accuracy using tf.summary() for test set
                            
                                Does gensim.corpora.Dictionary have term frequency saved?
                            
                                Porting PyTorch code from CPU to GPU
                            
                                multioutput classifier/learning 5 target variables
                            
                                Multiprocessing, Pool.map()
                            
                                How to find the exact intersection of a curve (as np.array) with y==0?
                            
                                Matlab repr function
                            
                                Reducing the number of arguments in function in Python?
                            
                                How to make new decorators available within a class without explicitly importing them?
                            
                                Googleapiclient and python3
                            
                                How to read the contents of a csv file into a class with each csv row as a class instance
                            
                                Translate using dictionaries
                            
                                Cuda GPU is slower than CPU in simple numpy operation
                            
                                How can I select a html element no matter what frame it is in in selenium?
                            
                                Python passing self to the decorator
                            
                                Pandas - Convert columns to new rows after groupby
                            
                                parent-child relationship query in simple_salesforce python, extracting from ordered dicts
                            
                                method object is not JSON serializable
                            
                                Python __dict__
                            
                                Installation of PyCairo on Windows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Removing leading zeros from pandas.core.series.Series

Tags:

python

pandas

time-series

Elina

People also ask

2 Answers

BENY

jezrael

Recent Activity

Donate For Us