I have a pandas dataframe as follows:
ticker    account      value         date
aa       assets       100,200       20121231, 20131231
bb       liabilities  50, 150       20141231, 20131231
I would like to split df['value'] and df['date'] so that the dataframe looks like this:
ticker    account      value         date
aa       assets       100           20121231
aa       assets       200           20131231 
bb       liabilities  50            20141231
bb       liabilities  150           20131231
Would greatly appreciate any help.
df.value = df.value.str.split(',')
df.date = df.date.str.split(',')
df = df.explode('value').explode("date").reset_index(drop=True)
df:
    ticker  account      value  date
0   aa      assets       100    20121231
1   aa      assets       100    20131231
2   aa      assets       200    20121231
3   aa      assets       200    20131231
4   bb      liabilities  50     20141231
5   bb      liabilities  50     20131231
6   bb      liabilities  50     20141231
7   bb      liabilities  50     20131231
                        I'm noticing this question a lot. That is, how do I split this column that has a list into multiple rows? I've seen it called exploding. Here are some links:
So I wrote a function that will do it.
def explode(df, columns):
    idx = np.repeat(df.index, df[columns[0]].str.len())
    a = df.T.reindex_axis(columns).values
    concat = np.concatenate([np.concatenate(a[i]) for i in range(a.shape[0])])
    p = pd.DataFrame(concat.reshape(a.shape[0], -1).T, idx, columns)
    return pd.concat([df.drop(columns, axis=1), p], axis=1).reset_index(drop=True)
But before we can use it, we need lists (or iterable) in a column.
df = pd.DataFrame([['aa', 'assets',      '100,200', '20121231,20131231'],
                   ['bb', 'liabilities', '50,50',   '20141231,20131231']],
                  columns=['ticker', 'account', 'value', 'date'])
df

split value and date columns:
df.value = df.value.str.split(',')
df.date = df.date.str.split(',')
df

Now we could explode on either column or both, one after the other.
explode(df, ['value','date'])

I removed strip from @jezrael's timing because I could not effectively add it to mine.  This is a necessary step for this question as OP has spaces in strings after commas.  I was aiming at providing a generic way to explode a column given it already has iterables in it and I think I've accomplished that.
code
def get_df(n=1):
    return pd.DataFrame([['aa', 'assets',      '100,200,200', '20121231,20131231,20131231'],
                         ['bb', 'liabilities', '50,50',   '20141231,20131231']] * n,
                        columns=['ticker', 'account', 'value', 'date'])
small 2 row sample

medium 200 row sample

large 2,000,000 row sample

You can first split columns, create Series by stack and remove whitespaces by strip:
s1 = df.value.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
s2 = df.date.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
Then concat both Series to df1:
df1 = pd.concat([s1,s2], axis=1, keys=['value','date'])
Remove old columns value and date and join:
print (df.drop(['value','date'], axis=1).join(df1).reset_index(drop=True))
  ticker      account value      date
0     aa       assets   100  20121231
1     aa       assets   200  20131231
2     bb  liabilities    50  20141231
3     bb  liabilities   150  20131231
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With