how can I by using pandas or numpy separate one column of 6 integer digits into 6 columns with one digit each?
import pandas as pd
import numpy as np
df = pd.Series(range(123456,123465))
df = pd.DataFrame(df)
df.head()
Number
654321
223344
Number | x1 | x2 | x3 | x4 | x5 | x6 |
654321 | 6 | 5 | 4 | 3 | 2 | 1 |
223344 | 2 | 2 | 3 | 3 | 4 | 4 |
1. Select a blank cell (says cell C1) for locating the first split digit of number in cell A1, then enter formula =MID($A1,COLUMN()-(COLUMN($C1)- 1),1) into the formula bar, and then press the Enter key.
Here is a simple suggestion:
import pandas as pd
# MCVE dataframe:
df = pd.DataFrame([123456, 456789, 135797, 123, 123456789], columns=['number'])
def digit(x, n):
"""Return the n-th digit of integer in base 10"""
return (x // 10**n) % 10
def digitize(df, key, n):
"""Extract n less significant digits from an integer in base 10"""
for i in range(n):
df['x%d' % i] = digit(df[key], n-i-1)
# Apply function on dataframe (inplace):
digitize(df, 'number', 6)
For the trial dataframe, it returns:
number x0 x1 x2 x3 x4 x5
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
3 123 0 0 0 1 2 3
4 123456789 4 5 6 7 8 9
This method avoids the need to cast into string
and then cast again to int
.
It relies on modular integer arithmetic, bellow details of operations:
10**3 # int: 1000 (integer power)
54321 // 10**3 # int: 54 (quotient of integer division)
(54321 // 10**3) % 10 # int: 4 (remainder of integer division, modulo)
Last but not least, it is fail safe and exact for number shorter than n
digits or greater than (notice it returns the n
less significant digits in latter case).
Some fun with views, assuming that each number has 6 digits:
u = df[['Number']].to_numpy().astype('U6').view('U1').astype(int)
df.join(pd.DataFrame(u).rename(columns=lambda c: f'x{c+1}'))
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
You could use np.unravel_index
df = pd.DataFrame({'Number': [654321,223344]})
def split_digits(df):
# get data as numpy array
numbers = df['Number'].to_numpy()
# extract digits
digits = np.unravel_index(numbers, 6*(10,))
# create column headers
columns = ['Number', *(f'x{i}' for i in "123456")]
# build and return new data frame
return pd.DataFrame(np.stack([numbers, *digits], axis=1), columns=columns, index=df.index)
split_digits(df)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
timeit(lambda:split_digits(df),number=1000)
# 0.3550272472202778
Thanks @GZ0 for some pandas
tips.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With