So i'm going to a table that will be 20 columns by x
rows and I need to find for each row which column the highest value belongs to.
ex:
The Table would be something like this (but larger)
A B C D E F G
1 2 3 4 5 6 7
9 8 7 6 5 4 3
7 6 5 8 4 3 2
0.9 0.01 0.02 0.2 0.04 0.3 ...
And I'd like it to spit out: G,A,D,A.
And I'll need to put this into another file. It doesnt even have to be with the letters. I'll be doing something with it later.
I've been trying to figure out the best way of doing this and i've been looking into trying to do it with R, this is the script I have so far:
#!/usr/bin/env Rscript
a=read.table(get(TEST.csv),header=T,sep="",dec=".")
apply(a, 1, which.max)
It doesnt want to read my Test file. And for python I have the following:
import numpy as np
import csv
a=np.genfromtxt('./TEST.csv',delimiter='\t',skip_header=1)
print(a)
amax=np.amax(a,axis=1)
print(amax)
This one properly extracts the highest Value of each row but it doesnt extract the Column number like I'd like it to do. Any and all suggestions would be greatly appreciated.
You can try max.col
in R
names(a)[max.col(a, 'first')]
#[1] "G" "A" "D" "A"
You can use pandas.read_csv to read the file into a dataframe and then use [idxmax][2]
:
import pandas as pd
df = pd.read_csv("in.csv", delimiter="\s+")
print(df.idxmax(axis=1))
0 G
1 A
2 D
3 A
dtype: object
replace the delimiter with the appropriate delimiter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With