Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine numbers from two columns to create one array

Code to create sample dataframe:

Sample = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': [[.332, .326], [.058, .138]]},
     {'account': 'Alpha Co',  'Jan': 200, 'Feb': 210, 'Mar': [[.234, .246], [.234, .395]]},
     {'account': 'Blue Inc',  'Jan': 50,  'Feb': 90,  'Mar': [[.084, .23], [.745, .923]]}]
df = pd.DataFrame(Sample)

Sample Dataframe visualized:

 df:
  account        Jan      Feb          Mar
Jones LLC  |     150   |   200    | [.332, .326], [.058, .138]
Alpha Co   |     200   |   210    | [[.234, .246], [.234, .395]
Blue Inc   |     50    |   90     | [[.084, .23], [.745, .923]

I'm looking for a formula to combine Jan and Feb columns into one array, outputting in a New column this array.

Expected output:

 df:
  account        Jan      Feb          Mar                             New
Jones LLC  |     150   |   200    | [.332, .326], [.058, .138]   |    [150, 200]
Alpha Co   |     200   |   210    | [[.234, .246], [.234, .395]  |    [200, 210]
Blue Inc   |     50    |   90     | [[.084, .23], [.745, .923]   |    [50, 90]
like image 551
Ashley O Avatar asked Jul 10 '17 19:07

Ashley O


People also ask

Can you merge data from two columns into one?

Combine data with the Ampersand symbol (&) Select the cell where you want to put the combined data. Type = and select the first cell you want to combine. Type & and use quotation marks with a space enclosed. Select the next cell you want to combine and press enter.


1 Answers

Use values.tolist

df.assign(New=df[['Feb', 'Jan']].values.tolist())
# inplace... use this
# df['New'] = df[['Feb', 'Jan']].values.tolist()

   Feb  Jan                               Mar    account         New
0  200  150  [[0.332, 0.326], [0.058, 0.138]]  Jones LLC  [200, 150]
1  210  200  [[0.234, 0.246], [0.234, 0.395]]   Alpha Co  [210, 200]
2   90   50   [[0.084, 0.23], [0.745, 0.923]]   Blue Inc    [90, 50]

Timing with larger data
Avoiding apply is more than 60 times faster with a 3,000 row dataframe.

df = pd.concat([df] * 1000, ignore_index=True)

%timeit df.assign(New=df[['Feb', 'Jan']].values.tolist())
%timeit df.assign(New=df.apply(lambda x: [x['Jan'], x['Feb']], axis=1))

1000 loops, best of 3: 947 µs per loop
10 loops, best of 3: 61.7 ms per loop

And 160 times faster for 30,000 row dataframe

df = pd.concat([df] * 10000, ignore_index=True)

100 loops, best of 3: 3.58 ms per loop
1 loop, best of 3: 586 ms per loop
like image 153
piRSquared Avatar answered Nov 15 '22 05:11

piRSquared