Convert commas to dots in txt with python that also contains scientific number formatting

Question

I have a text file ( huge ) with all numbers separated with a combination of spaces and tabs between them and with comma for decimal and after decimal separation, while the first column is scientific formatted and the next ones are numbers but with commas. I just put the first row here as numbers :

0,0000000E00 -2,7599284 -1,3676726 -1,7231264 -1,0558825 -1,8871096 -3,0763804 -3,2206187 -3,2308111 -2,3147060 -3,9572818 -4,0232415 -4,2180738

the file is so huge that a notepad++ can't process it to convert the "," to "."

So what I do is :

with open(file) as fp:
    line = fp.readline()
    cnt = 1
    while line:
        digits=re.findall(r'([\d.:]+)', line)
        s=line
        s = s.replace('.','').replace(',','.')
        number = float(s)
        cnt += 1

I tried even to use digits, but that causes to divide the first column in two numbers :

output-digits

and eventually the error I get when using .replace command. what I would have prefered was to convert the commas to dots regardless of disturbing formats like scientific. I appreciate your help

ValueError: could not convert string to float: ' 00000000E00
-29513521 -17002219 -22375536 -14994097
-24163610 -34076621 -31233623 -32341597
-24724552 -42434935 -43454237 -44885144
'

I also put how the input looks like in txt and how I need it in output ( in csv format )

input seems like this :

first line :

between 1st and 2nd column : 3 spaces + 1 Tab

between rest of columns : 6 spaces + 1 Tab

second line and on :

between 1st and 2nd column : 2 spaces + 1 Tab

between rest of columns : 6 spaces + 1 Tab

this is a screen shot of the txt input file : Attention : there is one space in the beginning of each line

input-txt

and what I want as output is csv file with separated columns with " ; "

enter image description here

Tim Biegeleisen · Accepted Answer

You may try reading the entire file into a Python string, and then doing a global replacement of comma to dot:

data = ""
with open('nums.csv', 'r') as file:
    data = file.read().replace(',', '.').replace(' ', ';')

with open("nums_out.csv", "w") as out_file:
    out_file.write(data)

For a possibly more robust solution, should there exist the possibility that two columns could be separated by multiple whitespace characters, use re.sub:

data = ""
with open('nums.csv', 'r') as file:
    data = file.read().replace(',', '.')
    data = re.sub(r'(?<=
|^)[^\S
]+', '', data)
    data = re.sub('(?<=\S)[^\S
]+', ';', data)

Swier · Answer

If you're working with tabular data in python, you'll want to use the pandas package. It's a large package, so if this is a one-off, the overhead of installing it might not be worth it.

Pandas has a read_csv function that deals with this easily, and the result can be exported to csv:

import pandas as pd
dataframe = pd.read_csv("input.txt", sep="\s+", decimal=",")
dataframe.to_csv("output.csv", sep=";", header=False, index=False)

Note: if your original file has no header, also pass header=None to the read_csv function.

Convert commas to dots in txt with python that also contains scientific number formatting

Tags:

python

text

FabioSpaghetti

2 Answers

Tim Biegeleisen

Swier

Recent Activity

Donate For Us

Convert commas to dots in txt with python that also contains scientific number formatting

Tags:

python

text

FabioSpaghetti

2 Answers

Tim Biegeleisen

Swier

Related questions

Recent Activity

Donate For Us