Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split a six digits number column into separated columns with one digit

how can I by using pandas or numpy separate one column of 6 integer digits into 6 columns with one digit each?

import pandas as pd
import numpy as np

df = pd.Series(range(123456,123465))

df = pd.DataFrame(df)


df.head()

what I have is like this one below

Number
654321
223344

The desired outcome should be like this one below.

Number | x1 | x2 | x3 | x4 | x5 | x6 |
654321 |  6 |  5 | 4  |  3 |  2 |  1 |
223344 |  2 |  2 | 3  |  3 |  4 |  4 |
like image 732
msalem85 Avatar asked Sep 04 '19 17:09

msalem85


People also ask

How do you extract a digit from a number in Excel?

1. Select a blank cell (says cell C1) for locating the first split digit of number in cell A1, then enter formula =MID($A1,COLUMN()-(COLUMN($C1)- 1),1) into the formula bar, and then press the Enter key.


3 Answers

MCVE

Here is a simple suggestion:

import pandas as pd

# MCVE dataframe:
df = pd.DataFrame([123456, 456789, 135797, 123, 123456789], columns=['number'])

def digit(x, n):
    """Return the n-th digit of integer in base 10"""
    return (x // 10**n) % 10

def digitize(df, key, n):
    """Extract n less significant digits from an integer in base 10"""
    for i in range(n):
        df['x%d' % i] = digit(df[key], n-i-1)

# Apply function on dataframe (inplace):
digitize(df, 'number', 6)

For the trial dataframe, it returns:

      number  x0  x1  x2  x3  x4  x5
0     123456   1   2   3   4   5   6
1     456789   4   5   6   7   8   9
2     135797   1   3   5   7   9   7
3        123   0   0   0   1   2   3
4  123456789   4   5   6   7   8   9

Observations

This method avoids the need to cast into string and then cast again to int.

It relies on modular integer arithmetic, bellow details of operations:

10**3                  # int: 1000 (integer power)
54321 // 10**3         # int: 54   (quotient of integer division)
(54321 // 10**3) % 10  # int: 4    (remainder of integer division, modulo)

Last but not least, it is fail safe and exact for number shorter than n digits or greater than (notice it returns the n less significant digits in latter case).

like image 193
jlandercy Avatar answered Oct 02 '22 05:10

jlandercy


Some fun with views, assuming that each number has 6 digits:


u = df[['Number']].to_numpy().astype('U6').view('U1').astype(int)

df.join(pd.DataFrame(u).rename(columns=lambda c: f'x{c+1}'))

   Number  x1  x2  x3  x4  x5  x6
0  654321   6   5   4   3   2   1
1  223344   2   2   3   3   4   4
like image 21
user3483203 Avatar answered Oct 02 '22 06:10

user3483203


You could use np.unravel_index

df = pd.DataFrame({'Number': [654321,223344]})

def split_digits(df):
    # get data as numpy array
    numbers = df['Number'].to_numpy()
    # extract digits
    digits = np.unravel_index(numbers, 6*(10,))
    # create column headers
    columns = ['Number', *(f'x{i}' for i in "123456")]
    # build and return new data frame
    return pd.DataFrame(np.stack([numbers, *digits], axis=1), columns=columns, index=df.index)


split_digits(df)
#    Number  x1  x2  x3  x4  x5  x6
# 0  654321   6   5   4   3   2   1
# 1  223344   2   2   3   3   4   4

timeit(lambda:split_digits(df),number=1000)
# 0.3550272472202778

Thanks @GZ0 for some pandas tips.

like image 28
Paul Panzer Avatar answered Oct 02 '22 05:10

Paul Panzer