Logo Questions Linux Laravel Mysql Ubuntu Git Menu

pandas: write tab-separated dataframe with literal tabs with no quotes

I have to reformat my data for a genetics software which requires to split each column into two, e.g 0-> G G; 1-> A G; 2 -> A A;. The output file is supposed to be tab-delimited. I am trying to do it in pandas:

import csv
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,3, size = (10,5)), 
                  columns=[ chr(c) for c in range(97, 97+5) ])

def fake_alleles(x):
    if x==0:
        return "A\tA"
    if x==1:
        return "A\tG"
    if x==2:
        return "G\tG"

plinkpast6 = df.applymap(fake_alleles)
plinkpast6.to_csv("test.ped", sep="\t", quoting=csv.QUOTE_NONE)

Which gives me an error Error: need to escape, but no escapechar set. Are there other ways to do it with pandas?

like image 839
Dima Lituiev Avatar asked May 21 '16 00:05

Dima Lituiev

People also ask

What is Notnull in pandas?

notnull is a pandas function that will examine one or multiple values to validate that they are not null. In Python, null values are reflected as NaN (not a number) or None to signify no data present. . notnull will return False if either NaN or None is detected. If these values are not present, it will return True.

How do I skip the header in pandas?

To read CSV file without header, use the header parameter and set it to “None” in the read_csv() method.

What command is read in a tab separated text file into a pandas DataFrame?

TSV stands for Tab Separated File Use pandas which is a text file where each field is separated by tab (\t). In pandas, you can read the TSV file into DataFrame by using the read_table() function.

1 Answers

sep="\t" is trying to take each element of the dataframe row and insert a "\t" in between. Problem is there are "\t" in the elements and it's confusing it. It wants you to escape those "\t"s in the elements and you haven't. I suspect you want your final output to be 6 columns.

Try this:

import csv
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,3, size = (10,20)))

def fake_alleles(x):
    if x==0:
        return "A\tA"
    if x==1:
        return "A\tG"
    if x==2:
        return "G\tG"

plinkpast6 = df.iloc[:,:3].applymap(fake_alleles)
plinkpast6 = plinkpast6.stack().str.split('\t', expand=True).unstack()
plinkpast6.to_csv("test.ped", sep="\t", quoting=csv.QUOTE_NONE)
like image 135
piRSquared Avatar answered Sep 20 '22 18:09
