I have a datafile like this:
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
14.2000 0.300000 0.01 0.999920
14.2000 0.301000 0.02 0.999960
14.2000 0.302000 0.03 0.999980
14.2000 0.303000 0.04 0.999980
14.2000 0.304000 0.06 0.999980
14.2000 0.305000 0.08 0.999970
14.2000 0.306000 0.2 0.999950
14.2000 0.307000 0.4 0.999910
14.2000 0.308000 0.8 0.999860
14.2000 0.309000 0.9 0.999960
14.2000 0.310000 0.8 0.999990
14.2000 0.311000 0.4 0.999980
14.2000 0.312000 0.2 0.999960
14.2000 0.313000 0.06 0.999940
14.2000 0.314000 0.03 0.999930
14.2000 0.315000 0.02 1.00000
14.2000 0.316000 0.01 1.00000
Required output file output.csv is this:
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
14.2000 0.304000 0.06 0.999980
14.2000 0.305000 0.08 0.999970
14.2000 0.306000 0.2 0.999950
14.2000 0.307000 0.4 0.999910
14.2000 0.308000 0.8 0.999860
14.2000 0.309000 0.9 0.999960
14.2000 0.310000 0.8 0.999990
14.2000 0.311000 0.4 0.999980
14.2000 0.312000 0.2 0.999960
14.2000 0.313000 0.06 0.999940
14.2000 0.314000 0.03 0.999930
# conditions are:
# output first element of column3 >= 0.05 i.e. 0.06
# output last element of column3 < 0.05 i.e. 0.03
# for the second may be we need to get the index of second 0.06 and
# get the value of next index.
How can we do so in python pandas or numpy?
My initial attempt is this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author : Bhishan Poudel
# Date : June 16, 2016
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#==============================================================================
# read in a file
infile = 'filter_2.txt'
colnames = ['angle', 'wave','trans', 'refl']
print('{} {} {} {}'.format('\nreading file : ', infile, '','' ))
df = pd.read_csv(infile,sep='\s+', header = None,skiprows = 0,
comment='#',names=colnames,usecols=(0,1,2,3))
print(df)
# find value of wavelength just above 0.05
print("\n")
df = df[(df['trans'] >= 0.05) ]
print(df)
Some similar links are following:
How to read between 2 specific lines in python
I'd skip pandas or numpy altogether
fo = open('filter_3.txt', 'w')
with open('filter_2.txt', 'r') as fi:
line = fi.readline()
while line:
split = line.split()
if (split[0] == '#') or (float(split[2]) >= 0.027):
print line,
fo.write(line)
line = fi.readline()
fo.close()
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
14.2000 0.302000 0.028 0.999980
14.2000 0.303000 0.030 0.999980
14.2000 0.304000 0.032 0.999980
14.2000 0.305000 0.030 0.999970
14.2000 0.306000 0.028 0.999950
fo = open('filter_3.txt', 'w')
with open('filter_2.txt', 'r') as fi:
new_line = fi.readline()
old_line = None
while new_line:
split_new = new_line.split()
if old_line is not None:
split_old = old_line.split()
cond0 = False if old_line is None else (split_old[0] == '#')
cond1 = split_new[0] == '#'
cond2 = float(split_new[2]) >= 0.05
cond3 = False if old_line is None else (float(split_old[2]) >= 0.05)
if (cond1 or cond2) or (cond3 and not cond0):
print new_line,
fo.write(new_line)
printed_old = True
old_line = new_line
new_line = fi.readline()
fo.close()
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
14.2000 0.304000 0.06 0.999980
14.2000 0.305000 0.08 0.999970
14.2000 0.306000 0.2 0.999950
14.2000 0.307000 0.4 0.999910
14.2000 0.308000 0.8 0.999860
14.2000 0.309000 0.9 0.999960
14.2000 0.310000 0.8 0.999990
14.2000 0.311000 0.4 0.999980
14.2000 0.312000 0.2 0.999960
14.2000 0.313000 0.06 0.999940
14.2000 0.314000 0.03 0.999930
If your goal is to maintain the appearance of the written file (i.e., the line spacing is the same), then you'll likely need to keep the original file's contents.
from io import StringIO
contents = open(infile).read()
df = pd.read_csv(StringIO(contents), sep='\s+', header = None,skiprows = 0,
comment='#',names=colnames,usecols=(0,1,2,3))
allowed_indices = df.query('trans >= 0.027').index.values
content_lines = np.array(contents.split('\n'))
num_comments = len([l for l in contents_lines if l.startswith('#')])
comment_and_allowed_indices = np.append(np.array(range(num_comments)),
allowed_indices + num_comments)
Then you'll just need to write the original contents to a file. They can be indexed via:
content_lines[comment_and_allowed_indices]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With