I am importing a text file into pandas, and would like to concatenate 3 of the columns from the file to make the index.
I am open to doing this in 1 or more steps. I can either do the conversion at the same time I create the DataFrame, or I can create the DataFrame and restructure it with the newly created column. Knowing how to do this both ways would be the most helpful for me.
I would eventually like the index to be value of concatenating the values in the first 3 columns.
By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
To create an index, from a column, in Pandas dataframe you use the set_index() method. For example, if you want the column “Year” to be index you type <code>df. set_index(“Year”)</code>. Now, the set_index() method will return the modified dataframe as a result.
You can use pandas. merge() to merge DataFrames by matching their index. When merging two DataFrames on the index, the value of left_index and right_index parameters of merge() function should be True . and by default, the pd.
If your columns consist of strings, you can just use the +
operator (addition in the context of strings is to concatenate them in python, and pandas follows this):
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'year':['2012', '2012'], 'month':['01', '02']})
In [3]: df
Out[3]:
month year
0 01 2012
1 02 2012
In [4]: df['concatenated'] = df['year'] + df['month']
In [5]: df
Out[5]:
month year concatenated
0 01 2012 201201
1 02 2012 201202
And then, if this column is created, you can just use set_index
to change the index
In [6]: df = df.set_index('concatenated')
In [7]: df
Out[7]:
month year
concatenated
201201 01 2012
201202 02 2012
Note that pd.concat
is not to 'concat'enate strings but to concatenate series/dataframes, so to add columns or rows of different dataframes or series together into one dataframe (not several rows/columns into one row/column). See http://pandas.pydata.org/pandas-docs/dev/merging.html for an extensive explanation of this.
If you're using read_csv
to import your text file, there is an index_col
argument that you can pass a list of column names or numbers to. This will end up creating a MultiIndex
- I'm not sure if that suits your application.
If you want to explicitly concatenate your index together (assuming that they are strings), it seems you can do so with the +
operator. (Warning, untested code ahead)
df['concatenated'] = df['year'] + df['month']
df.set_index('concatenated')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With