I have a file with multiple (over 1000) columns and rows, and their names do not follow any pattern. The example of it as in below:
file1.txt
IDs AABC ABC6 YHG.8 D78Ha
Ellie 12 48.70 33
Kate 98 34 21 76.36
Joe 22 53 49
Van 77 40 12.1
Xavier 88.85
First, I have to fill the blanks with NA, so that it will look like :
file1.txt
IDs AABC ABC6 YHG.8 D78Ha
Ellie 12 NA 48.70 33
Kate 98 34 21 76.36
Joe 22 53 49 NA
Van 77 NA 40 12.1
Xavier NA NA NA 88.85
Then, I am trying to get all combinations for IDs and other column as AABC, ABC6,YHG.8 and D78Ha, such as :
Ellie , AABC --> 12
Ellie, ABC6 --> NA
Ellie, YHG.8 --> 48.70 ( without rounding )
Ellie, D78Ha --> 33
Kate,AABC --> 98
Kate, ABC6 --> 34
...
So the desired output should be 20 lines (4 columns x 5 IDs) as following:
output.txt
Ellie AABC 12
Ellie ABC6 NA
Ellie YHG.8 48.70
Ellie D78Ha 33
Kate AABC 98
Kate ABC6 34
..
For this reason, I filled the blanks manually with NA, read file with pandas, and indexed the IDs.
So that I can reach with the ID names and other column names.
But I could not iterate it. My try was:
import pandas as pd
tablefile = pd.read_csv('file1.txt',sep='\t')
print(tablefile)
df2=tablefile.set_index("IDs")
print("Ellie AABC " , df2.loc["Ellie", "AABC" ])
print("Kate AABC " , df2.loc["Kate", "AABC" ])
print("Xavier AABC " , df2.loc["Xavier", "AABC" ])
It prints:
('Ellie AABC ', 12.0)
('Kate AABC ', 98.0)
('Xavier AABC ', nan)
How can I fill the blanks with NAs and iterate in this array without calling the names by writing it one by one? Maybe with increasing i in [i,i]?
IIUC stack with dropna = False
df.set_index('IDs').stack(dropna=False).astype(object).reset_index()
Out[915]:
IDs level_1 0
0 Ellie AABC 12
1 Ellie ABC6 NaN
2 Ellie YHG.8 48.7
3 Ellie D78Ha 33
4 Kate AABC 98
5 Kate ABC6 34
6 Kate YHG.8 21
7 Kate D78Ha 76.36
8 Joe AABC 22
9 Joe ABC6 53
10 Joe YHG.8 49
11 Joe D78Ha NaN
12 Van AABC 77
13 Van ABC6 NaN
14 Van YHG.8 40
15 Van D78Ha 12.1
16 Xavier AABC NaN
17 Xavier ABC6 NaN
18 Xavier YHG.8 NaN
19 Xavier D78Ha 88.85
Simply melt to reshape dataframe:
Data
from io import StringIO
import pandas as pd
txt = """IDs AABC ABC6 YHG.8 D78Ha
Ellie 12 NA 48.70 33
Kate 98 34 21 76.36
Joe 22 53 49 NA
Van 77 NA 40 12.1
Xavier NA NA NA 88.8"""
tabledf = pd.read_table(StringIO(txt), sep="\s+")
Melt
melted_df = pd.melt(tabledf, id_vars = "IDs").sort_values('IDs').reset_index(drop=True)
print(melted_df)
# IDs variable value
# 0 Ellie AABC 12.00
# 1 Ellie ABC6 NaN
# 2 Ellie YHG.8 48.70
# 3 Ellie D78Ha 33.00
# 4 Joe AABC 22.00
# 5 Joe D78Ha NaN
# 6 Joe ABC6 53.00
# 7 Joe YHG.8 49.00
# 8 Kate AABC 98.00
# 9 Kate ABC6 34.00
# 10 Kate YHG.8 21.00
# 11 Kate D78Ha 76.36
# 12 Van AABC 77.00
# 13 Van ABC6 NaN
# 14 Van D78Ha 12.10
# 15 Van YHG.8 40.00
# 16 Xavier ABC6 NaN
# 17 Xavier AABC NaN
# 18 Xavier YHG.8 NaN
# 19 Xavier D78Ha 88.80
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With