How can I use map with multi-index in pandas?

Tags:

I have a data table of data for a variety of genomic positions. The positions are represented as 3-tuples ('chromosome', 'srand', position) that I've turned into a multi-index. My goal is to look up various information about each position and add that to the table (for example gene name, etc.) I can do this with pybedtools.

df = pd.DataFrame(data={'A':range(1,8), 'B':range(1,8), 'C': range(1,8)},
 index=pd.MultiIndex.from_tuples([('chrom1', '-', 1234), ('chrom1', '+', 5678),
 ('chrom1', '+', 9876),  ('chrom2', '+', 13579), ('chrom2', '+', 8497), ('chrom2', '-', 98765),
 ('chrom2', '-', 76856)]))

df.index.rename(['chrom','strand','abs_pos'], inplace=True)

                       A  B  C
chrom  strand abs_pos         
chrom1 -      1234     1  1  1
       +      5678     2  2  2
              9876     3  3  3
chrom2 +      13579    4  4  4
              8497     5  5  5
       -      98765    6  6  6
              76856    7  7  7

My issue is with adding columns to a data frame with a multi-index. This seems straight forward without a multi-index: pandas - add new column to dataframe from dictionary

I have a dictionary of the look up information with 3-tuple keys corresponding to the multi-index. How can I add this data as a new column?

gene_d = {('chrom1', '-', 1234) : 'geneA', ('chrom1', '+', 5678): 'geneB', 
    ('chrom1', '+', 9876): 'geneC', ('chrom2', '+', 13579): 'geneD',
    ('chrom2', '+', 8497): 'geneE', ('chrom2', '-', 98765): 'geneF', 
    ('chrom2', '-', 76856): 'geneG'}

I've tried map, but can't seem to figure out how to get it to work with a multi-index to yield the following:

                                A  B  C
chrom  strand abs_pos gene
chrom1 -      1234    geneA     1  1  1
       +      5678    geneB     2  2  2
              9876    geneC     3  3  3
chrom2 +      13579   geneD     4  4  4
              8497    geneE     5  5  5
       -      98765   geneF     6  6  6
              76856   geneG     7  7  7

642

asked Mar 28 '17 18:03

HikerT

2 Answers

A vectorized approach:

df['gene'] = df.index #you get the index as tuple
df['gene'] = df['gene'].map(gene_d)
df = df.set_index('gene', append=True)

Resulting df:

                                A   B   C
chrom   strand  abs_pos gene            
chrom1  -       1234    geneA   1   1   1
        +       5678    geneB   2   2   2
                9876    geneC   3   3   3
chrom2  +       13579   geneD   4   4   4
                8497    geneE   5   5   5
        -       98765   geneF   6   6   6
                76856   geneG   7   7   7

answered Oct 04 '22 02:10

Vaishali

Make gene_d into a dataframe:

df1 = pd.DataFrame.from_dict(gene_d, orient='index').rename(columns={0:'gene'})

Give it a multindex:

df1.index = pd.MultiIndex.from_tuples(df1.index)

Concatenate with original df:

new_df = pd.concat([df, df1], axis=1).sort_values('A')

Do some clean up:

new_df.index.rename(['chrom','strand','abs_pos'], inplace=True)
new_df.set_index('gene', append=True)
new_df

                             A  B  C
chrom  strand abs_pos gene          
chrom1 -      1234    geneA  1  1  1
       +      5678    geneB  2  2  2
              9876    geneC  3  3  3
chrom2 +      13579   geneD  4  4  4
              8497    geneE  5  5  5
       -      98765   geneF  6  6  6
              76856   geneG  7  7  7

answered Oct 04 '22 00:10

cfort

Related questions
                            
                                How to get PyCharm to auto-complete code in methods?
                            
                                What does it mean in linux scripts? #!/usr/bin/python -tt
                            
                                Python/SQLite3: cannot commit - no transaction is active
                            
                                How to log memory usage of an Django app per request
                            
                                matplotlib savefig image size with bbox_inches='tight'
                            
                                Numpy: 1D array with various shape
                            
                                Python: URLError: <urlopen error [Errno 10060]
                            
                                Histogram from data which is already binned, I have bins and frequency values
                            
                                Upload a file to a python flask server using curl
                            
                                Get minimum value field name using aggregation in django
                            
                                Python: Frequency of occurrences
                            
                                How to inspect variables after Traceback?
                            
                                shuffle a large list of items without loading in memory
                            
                                Fisher's exact test for bigger than 2 by 2 contingency table
                            
                                How to find out if argparse argument has been actually specified on command line?
                            
                                Are there really only 4 Matplotlib Line Styles?
                            
                                Why does date + timedelta become date, not datetime?
                            
                                List comprehensions in Jinja
                            
                                Add Tensorflow pre-processing to existing Keras model (for use in Tensorflow Serving)
                            
                                How to insert trailing spaces in a doctest, so that it doesn't fail even when actual and expected result look the same?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I use map with multi-index in pandas?

Tags:

python

pandas

multi-index

HikerT

People also ask

2 Answers

Vaishali

cfort

Recent Activity

Donate For Us