Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert commas to dots in txt with python that also contains scientific number formatting

Tags:

python

text

I have a text file ( huge ) with all numbers separated with a combination of spaces and tabs between them and with comma for decimal and after decimal separation, while the first column is scientific formatted and the next ones are numbers but with commas. I just put the first row here as numbers :

0,0000000E00 -2,7599284 -1,3676726 -1,7231264 -1,0558825 -1,8871096 -3,0763804 -3,2206187 -3,2308111 -2,3147060 -3,9572818 -4,0232415 -4,2180738

the file is so huge that a notepad++ can't process it to convert the "," to "."

So what I do is :

with open(file) as fp:
    line = fp.readline()
    cnt = 1
    while line:
        digits=re.findall(r'([\d.:]+)', line)
        s=line
        s = s.replace('.','').replace(',','.')
        number = float(s)
        cnt += 1 

I tried even to use digits, but that causes to divide the first column in two numbers :

output-digits

and eventually the error I get when using .replace command. what I would have prefered was to convert the commas to dots regardless of disturbing formats like scientific. I appreciate your help

ValueError: could not convert string to float: ' 00000000E00
\t-29513521 \t-17002219 \t-22375536 \t-14994097
\t-24163610 \t-34076621 \t-31233623 \t-32341597
\t-24724552 \t-42434935 \t-43454237 \t-44885144
\n'

I also put how the input looks like in txt and how I need it in output ( in csv format )

input seems like this :

first line :

between 1st and 2nd column : 3 spaces + 1 Tab

between rest of columns : 6 spaces + 1 Tab

second line and on :

between 1st and 2nd column : 2 spaces + 1 Tab

between rest of columns : 6 spaces + 1 Tab

this is a screen shot of the txt input file : Attention : there is one space in the beginning of each line

input-txt

and what I want as output is csv file with separated columns with " ; "

enter image description here

like image 946
FabioSpaghetti Avatar asked Dec 18 '19 08:12

FabioSpaghetti


2 Answers

You may try reading the entire file into a Python string, and then doing a global replacement of comma to dot:

data = ""
with open('nums.csv', 'r') as file:
    data = file.read().replace(',', '.').replace(' ', ';')

with open("nums_out.csv", "w") as out_file:
    out_file.write(data)

For a possibly more robust solution, should there exist the possibility that two columns could be separated by multiple whitespace characters, use re.sub:

data = ""
with open('nums.csv', 'r') as file:
    data = file.read().replace(',', '.')
    data = re.sub(r'(?<=\n|^)[^\S\r\n]+', '', data)
    data = re.sub('(?<=\S)[^\S\r\n]+', ';', data)
like image 64
Tim Biegeleisen Avatar answered Oct 21 '22 07:10

Tim Biegeleisen


If you're working with tabular data in python, you'll want to use the pandas package. It's a large package, so if this is a one-off, the overhead of installing it might not be worth it.

Pandas has a read_csv function that deals with this easily, and the result can be exported to csv:

import pandas as pd
dataframe = pd.read_csv("input.txt", sep="\s+", decimal=",")
dataframe.to_csv("output.csv", sep=";", header=False, index=False)

Note: if your original file has no header, also pass header=None to the read_csv function.

like image 43
Swier Avatar answered Oct 21 '22 06:10

Swier