Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use map with multi-index in pandas?

I have a data table of data for a variety of genomic positions. The positions are represented as 3-tuples ('chromosome', 'srand', position) that I've turned into a multi-index. My goal is to look up various information about each position and add that to the table (for example gene name, etc.) I can do this with pybedtools.

df = pd.DataFrame(data={'A':range(1,8), 'B':range(1,8), 'C': range(1,8)},
 index=pd.MultiIndex.from_tuples([('chrom1', '-', 1234), ('chrom1', '+', 5678),
 ('chrom1', '+', 9876),  ('chrom2', '+', 13579), ('chrom2', '+', 8497), ('chrom2', '-', 98765),
 ('chrom2', '-', 76856)]))

df.index.rename(['chrom','strand','abs_pos'], inplace=True)

                       A  B  C
chrom  strand abs_pos         
chrom1 -      1234     1  1  1
       +      5678     2  2  2
              9876     3  3  3
chrom2 +      13579    4  4  4
              8497     5  5  5
       -      98765    6  6  6
              76856    7  7  7

My issue is with adding columns to a data frame with a multi-index. This seems straight forward without a multi-index: pandas - add new column to dataframe from dictionary

I have a dictionary of the look up information with 3-tuple keys corresponding to the multi-index. How can I add this data as a new column?

gene_d = {('chrom1', '-', 1234) : 'geneA', ('chrom1', '+', 5678): 'geneB', 
    ('chrom1', '+', 9876): 'geneC', ('chrom2', '+', 13579): 'geneD',
    ('chrom2', '+', 8497): 'geneE', ('chrom2', '-', 98765): 'geneF', 
    ('chrom2', '-', 76856): 'geneG'}

I've tried map, but can't seem to figure out how to get it to work with a multi-index to yield the following:

                                A  B  C
chrom  strand abs_pos gene
chrom1 -      1234    geneA     1  1  1
       +      5678    geneB     2  2  2
              9876    geneC     3  3  3
chrom2 +      13579   geneD     4  4  4
              8497    geneE     5  5  5
       -      98765   geneF     6  6  6
              76856   geneG     7  7  7
like image 642
HikerT Avatar asked Mar 28 '17 18:03

HikerT


People also ask

How does pandas handle multiple index columns?

pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero. Yields below output.

How do you slice multiple index in pandas?

You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.


2 Answers

A vectorized approach:

df['gene'] = df.index #you get the index as tuple
df['gene'] = df['gene'].map(gene_d)
df = df.set_index('gene', append=True)

Resulting df:

                                A   B   C
chrom   strand  abs_pos gene            
chrom1  -       1234    geneA   1   1   1
        +       5678    geneB   2   2   2
                9876    geneC   3   3   3
chrom2  +       13579   geneD   4   4   4
                8497    geneE   5   5   5
        -       98765   geneF   6   6   6
                76856   geneG   7   7   7
like image 82
Vaishali Avatar answered Oct 04 '22 02:10

Vaishali


Make gene_d into a dataframe:

df1 = pd.DataFrame.from_dict(gene_d, orient='index').rename(columns={0:'gene'})

Give it a multindex:

df1.index = pd.MultiIndex.from_tuples(df1.index)

Concatenate with original df:

new_df = pd.concat([df, df1], axis=1).sort_values('A')

Do some clean up:

new_df.index.rename(['chrom','strand','abs_pos'], inplace=True)
new_df.set_index('gene', append=True)
new_df

                             A  B  C
chrom  strand abs_pos gene          
chrom1 -      1234    geneA  1  1  1
       +      5678    geneB  2  2  2
              9876    geneC  3  3  3
chrom2 +      13579   geneD  4  4  4
              8497    geneE  5  5  5
       -      98765   geneF  6  6  6
              76856   geneG  7  7  7
like image 28
cfort Avatar answered Oct 04 '22 00:10

cfort