Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python script for manipulating excel sheet

Tags:

python

excel

I am trying to write a python script to manipulate the excel spreadsheet.

Suppose if,Ihave the sample data:

Gene        chrom    strand  TSS        TES         Name

NM_145215   chr5     +       135485168  135488045   Abhd11

NM_1190437  chr5     +       135485021  135488045   Abhd11

NM_1205181  chr14    +       54873803   54888844    Abhd4

NM_134076   chr14    +       54878906   54888844    Abhd4

NM_9594     chr2     +       31615464   31659747    Abl1

NM_1112703  chr2     +       31544075   31659747    Abl1

NM_207624   chr11    +       105829258  105851278   Abl1

NM_9598     chr11    +       105836521  105851278   Ace2

NM_1130513  chrX     +       160577273  160626350   Ace2

NM_27286    chrX     +       160578411  160626350   Ace2

For those similar names(column 6), I want to retrieve the whole row with least TSS. Example, for first 2 rows-Abhd11 name, I want to save the 2nd row in my result since the TSS 135485021 < 135485168. So on for all the sets with same NAMES.

Any ideas and comments are appreciated.

like image 793
user1574761 Avatar asked Jun 06 '26 03:06

user1574761


2 Answers

Input

If possible I would save the excel file as a csv file and then load into python using the csv module.

Alternatively you could use the xlrd module for reading excel files - though I haven't used this and don't know much about it.

openpyxl is an additional option for parsing the excel file (cheers Just another dunce).

Manipulation

ernie's idea seems workable, and I would implement it as follows. Assuming that linesreadfromfile is a list of lists as read using csv.reader i.e. each list element is a list of values that corresponds to the delimited entries of that row in the file,

finaldict = {}
for row in linesreadfromfile:
    if finaldict.has_key(row[5]):
        if finaldict[row[5]][3] > row[3]:
            finaldict[row[5]] = row
    else:
        finaldict[row[5]] = row
like image 118
jmetz Avatar answered Jun 08 '26 17:06

jmetz


I agree with mutzmatron and would recommend the xlrd module. Here's a simple example:

import xlrd

# Create your file handle
file_handle = xlrd.open_workbook(file_name)

# Use the first page in the spreadsheet (0-based indexes)
sheet = file_handle.sheet_by_index(0)

# Create dictionary for storing values
abc = {}

# Loop through every row
for i in range(sheet.nrows):
  line = sheet.row_values(i)

  # Get your 'Name' and 'TSS' columns
  name = line[5]
  tss = line[3]

  # Add this 'Name' to your dictionary if it's new, or keep the max value
  if name not in abc.keys():
    abc[name] = tss
  else:
    abc[name] = max(abc[name],tss)

Obviously changing what you'd need to save (full row, certain values, etc.) based on your spec.

--- EDIT ---

  # If this 'Name' is new, save this line
  if name not in abc.keys():
    abc[name] = {'tss': tss, 'line': line}

  # Else, if this 'Name' is not new and the TSS is less, keep this new line
  elif tss < abc[name]['tss']:
    abc[name]['line'] = line
like image 24
Valdogg21 Avatar answered Jun 08 '26 15:06

Valdogg21