Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify the column that the highest row value belongs to, python or R

Tags:

python

r

So i'm going to a table that will be 20 columns by x rows and I need to find for each row which column the highest value belongs to. ex:

The Table would be something like this (but larger)
 A       B       C       D       E       F       G
 1       2       3       4       5       6       7
 9       8       7       6       5       4       3
 7       6       5       8       4       3       2
 0.9     0.01    0.02    0.2     0.04    0.3   ...

And I'd like it to spit out: G,A,D,A. And I'll need to put this into another file. It doesnt even have to be with the letters. I'll be doing something with it later. I've been trying to figure out the best way of doing this and i've been looking into trying to do it with R, this is the script I have so far:

#!/usr/bin/env Rscript
a=read.table(get(TEST.csv),header=T,sep="",dec=".")
apply(a, 1, which.max)

It doesnt want to read my Test file. And for python I have the following:

import numpy as np
import csv
a=np.genfromtxt('./TEST.csv',delimiter='\t',skip_header=1)
print(a)
amax=np.amax(a,axis=1)
print(amax)

This one properly extracts the highest Value of each row but it doesnt extract the Column number like I'd like it to do. Any and all suggestions would be greatly appreciated.

like image 989
AbbiNormal Avatar asked Jan 08 '23 04:01

AbbiNormal


2 Answers

You can try max.col in R

names(a)[max.col(a, 'first')]
#[1] "G" "A" "D" "A"
like image 138
akrun Avatar answered Jan 16 '23 21:01

akrun


You can use pandas.read_csv to read the file into a dataframe and then use [idxmax][2]:

import pandas as pd

df = pd.read_csv("in.csv", delimiter="\s+")

print(df.idxmax(axis=1))
0    G
1    A
2    D
3    A
dtype: object

replace the delimiter with the appropriate delimiter.

like image 38
Padraic Cunningham Avatar answered Jan 16 '23 20:01

Padraic Cunningham