Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concatenate Columns as Index in Pandas

Tags:

python

pandas

I am importing a text file into pandas, and would like to concatenate 3 of the columns from the file to make the index.

I am open to doing this in 1 or more steps. I can either do the conversion at the same time I create the DataFrame, or I can create the DataFrame and restructure it with the newly created column. Knowing how to do this both ways would be the most helpful for me.

I would eventually like the index to be value of concatenating the values in the first 3 columns.

like image 665
DJElbow Avatar asked Jul 23 '13 20:07

DJElbow


People also ask

How do I concatenate columns in pandas?

By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.

How do I create a column index in pandas?

To create an index, from a column, in Pandas dataframe you use the set_index() method. For example, if you want the column “Year” to be index you type <code>df. set_index(“Year”)</code>. Now, the set_index() method will return the modified dataframe as a result.

How do I merge two DataFrames with the same index?

You can use pandas. merge() to merge DataFrames by matching their index. When merging two DataFrames on the index, the value of left_index and right_index parameters of merge() function should be True . and by default, the pd.


2 Answers

If your columns consist of strings, you can just use the + operator (addition in the context of strings is to concatenate them in python, and pandas follows this):

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'year':['2012', '2012'], 'month':['01', '02']})

In [3]: df
Out[3]:
  month  year
0    01  2012
1    02  2012

In [4]: df['concatenated'] = df['year'] + df['month']

In [5]: df
Out[5]:
  month  year concatenated
0    01  2012       201201
1    02  2012       201202

And then, if this column is created, you can just use set_index to change the index

In [6]: df = df.set_index('concatenated')

In [7]: df
Out[7]:
             month  year
concatenated
201201          01  2012
201202          02  2012

Note that pd.concat is not to 'concat'enate strings but to concatenate series/dataframes, so to add columns or rows of different dataframes or series together into one dataframe (not several rows/columns into one row/column). See http://pandas.pydata.org/pandas-docs/dev/merging.html for an extensive explanation of this.

like image 156
joris Avatar answered Oct 14 '22 08:10

joris


If you're using read_csv to import your text file, there is an index_col argument that you can pass a list of column names or numbers to. This will end up creating a MultiIndex - I'm not sure if that suits your application.

If you want to explicitly concatenate your index together (assuming that they are strings), it seems you can do so with the + operator. (Warning, untested code ahead)

df['concatenated'] = df['year'] + df['month']
df.set_index('concatenated')
like image 43
voithos Avatar answered Oct 14 '22 08:10

voithos