Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read text file and parse in python

I have a text file(.txt) just looks like below:


Date, Day, Sect, 1, 2, 3

1, Sun, 1-1, 123, 345, 678

2, Mon, 2-2, 234, 585, 282

3, Tue, 2-2, 231, 232, 686


With this data I want to do the followings:

1) Read the text file by line as a separate element in the list

  • Split elements by comma

  • Delete non-necessary elements('\n') in the list

For the two, I did these.

file = open('abc.txt', mode = 'r', encoding = 'utf-8-sig')
lines = file.readlines()
file.close()
my_dict = {}
my_list = []
for line in lines:
    line = line.split(',')
    line = [i.strip() for i in line]

2) Set the first row(Date, Day, Sect, 1, 2, 3) as key and set the other rows as values in the dictionary.

    my_dict['Date'] = line[0]
    my_dict['Day'] = line[1]
    my_dict['Sect'] = line[2]
    my_dict['1'] = line[3]
    my_dict['2'] = line[4]
    my_dict['3'] = line[5]

The above code has two issues: 1) Set the first row as dictionary, too. 2) If I add this to the list as the below, it only keeps the last row as all elements in the list.

3) Create a list including the dictionary as elements.

    my_list.append(my_dict)    

4) Subset the elements that I want to.

I couldn't write any code from here. But What I want to do is subset elements meeting the condition: For example, choosing the element in the dictionary where the Sect is 2-2. Then the wanted results could be as the follows:

>> [{'Date': '2', 'Day': 'Mon', 'Sect': '2-2', '1': '234', '2': '585', '3': '282'}, {'Date': '3', 'Day': 'Tue', 'Sect': '2-2', '1': '231', '2':'232', '3':'686'}]

Thanks,

like image 248
supremed14 Avatar asked Jan 28 '23 18:01

supremed14


2 Answers

@supremed14, you can also try the below code to prepare the list of dictionaries after reading the file.

data.txt

As white spaces are there in text file. strip() method defined on strings will solve this problem.

Date, Day, Sect, 1, 2, 3

1, Sun, 1-1, 123, 345, 678

2, Mon, 2-2, 234, 585, 282

3, Tue, 2-2, 231, 232, 686

Source code:

Here you do not need to worry about closing the file. It will be taken care by Python.

import json
my_list = [];

with open('data.txt') as f:
    lines = f.readlines() # list containing lines of file
    columns = [] # To store column names

    i = 1
    for line in lines:
        line = line.strip() # remove leading/trailing white spaces
        if line:
            if i == 1:
                columns = [item.strip() for item in line.split(',')]
                i = i + 1
            else:
                d = {} # dictionary to store file data (each line)
                data = [item.strip() for item in line.split(',')]
                for index, elem in enumerate(data):
                    d[columns[index]] = data[index]

                my_list.append(d) # append dictionary to list

# pretty printing list of dictionaries
print(json.dumps(my_list, indent=4))

Output:

[
    {
        "Date": "1",
        "Day": "Sun",
        "Sect": "1-1",
        "1": "123",
        "2": "345",
        "3": "678"
    },
    {
        "Date": "2",
        "Day": "Mon",
        "Sect": "2-2",
        "1": "234",
        "2": "585",
        "3": "282"
    },
    {
        "Date": "3",
        "Day": "Tue",
        "Sect": "2-2",
        "1": "231",
        "2": "232",
        "3": "686"
    }
]
like image 51
hygull Avatar answered Jan 31 '23 22:01

hygull


Using pandas this is pretty easy:

Input:

$cat test.txt
Date, Day, Sect, 1, 2, 3
1, Sun, 1-1, 123, 345, 678
2, Mon, 2-2, 234, 585, 282
3, Tue, 2-2, 231, 232, 686

Operations:

import pandas as pd
df = pd.read_csv('test.txt', skipinitialspace=True)
df.loc[df['Sect'] == '2-2'].to_dict(orient='records')

Output:

[{'1': 234, '2': 585, '3': 282, 'Date': 2, 'Day': 'Mon', 'Sect': '2-2'},
 {'1': 231, '2': 232, '3': 686, 'Date': 3, 'Day': 'Tue', 'Sect': '2-2'}]
like image 23
cosmic_inquiry Avatar answered Jan 31 '23 22:01

cosmic_inquiry