Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preserving Column Order - Python Pandas and Column Concat

So my google-fu doesn't seem to be doing me justice with what seems like should be a trivial procedure.

In Pandas for Python I have 2 datasets, I want to merge them. This works fine using .concat. The issue is, .concat reorders my columns. From a data retrieval point of view, this is trivial. From a "I just want to open the file and quickly see the most important column" point of view, this is annoying.

File1.csv
Name    Username    Alias1 
Tom     Tomfoolery   TJZ
Meryl   MsMeryl      Mer
Timmy   Midsize      Yoda

File2.csv
Name    Username   Alias 1   Alias 2
Bob     Firedbob   Fire      Gingy
Tom     Tomfoolery  TJZ      Awww

Result.csv
    Alias1 Alias2   Name    Username
0   TJZ    NaN      Tom     Tomfoolery
1   Mer    NaN      Meryl   MsMeryl
2   Yoda   NaN      Timmy   Midsize
0   Fire   Gingy    Bob     Firedbob
1   TJZ    Awww     Tom     Tomfoolery

The result is fine, but in the data-file I'm working with I have 1,000 columns. The 2-3 most important are now in the middle. Is there a way, in this toy example, I could've forced "Username" to be the first column and "Name" to be the second column, preserving the values below each all the way down obviously.

Also as a side note, when I save to file it also saves that numbering on the side (0 1 2 0 1). If theres a way to prevent that too, that'd be cool. If not, its not a big deal since it's a quick fix to remove.

Thanks!

like image 494
Jibril Avatar asked Sep 12 '15 00:09

Jibril


1 Answers

Assuming the concatenated DataFrame is df, you can perform the reordering of columns as follows:

important = ['Username', 'Name']
reordered = important + [c for c in df.columns if c not in important]
df = df[reordered]
print df

Output:

     Username   Name Alias1 Alias2
0  Tomfoolery    Tom    TJZ    NaN
1     MsMeryl  Meryl    Mer    NaN
2     Midsize  Timmy   Yoda    NaN
0    Firedbob    Bob   Fire  Gingy
1  Tomfoolery    Tom    TJZ   Awww

The list of numbers [0, 1, 2, 0, 1] is the index of the DataFrame. To prevent them from being written to the output file, you can use the index=False option in to_csv():

df.to_csv('Result.csv', index=False, sep=' ')
like image 164
YS-L Avatar answered Oct 26 '22 11:10

YS-L