I have a ever growing csv file that looks like:
143100, 2012-05-21 09:52:54.165852
125820, 2012-05-21 09:53:54.666780
109260, 2012-05-21 09:54:55.144712
116340, 2012-05-21 09:55:55.642197
125640, 2012-05-21 09:56:56.094999
122820, 2012-05-21 09:57:56.546567
124770, 2012-05-21 09:58:57.046050
103830, 2012-05-21 09:59:57.497299
114120, 2012-05-21 10:00:58.000978
-31549410, 2012-05-21 10:01:58.063470
90390, 2012-05-21 10:02:58.108794
81690, 2012-05-21 10:03:58.161329
80940, 2012-05-21 10:04:58.227664
102180, 2012-05-21 10:05:58.289882
99750, 2012-05-21 10:06:58.322063
87000, 2012-05-21 10:07:58.391256
92160, 2012-05-21 10:08:58.442438
80130, 2012-05-21 10:09:58.506494
The negative numbers occur when the service that generates the file has an API connection failure. I'm already using matplotlib to graph the data, however the artificial negative numbers screw the graph greatly. I would like to locate all negative entries and remove the corresponding lines. At no point is a negative number actually representative of any real data.
In Bash I would do something like:
awk '{print $1}' original.csv | sed '/-/d' > new.csv
but that's messy and tends to be slow, and I don't really want to embed bash commands in my python graphing script if I can help it.
Can anyone point me in the right direction?
Edit:
Here's the code I'm using to read/plot the data:
import matplotlib
matplotlib.use('Agg')
from matplotlib.mlab import csv2rec
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from pylab import *
output_image_name='tpm.png'
data = csv2rec('counter.log', names=['packets', 'time'])
rcParams['figure.figsize'] = 10, 5
rcParams['font.size'] = 8
fig = plt.figure()
plt.plot(data['packets'], data['time'])
ax = fig.add_subplot(111)
ax.plot(data['time'], data['tweets'])
hours = mdates.HourLocator()
fmt = mdates.DateFormatter('%D - %H:%M')
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(fmt)
ax.grid()
plt.ylabel("packets")
plt.title("Packet Log: Packets Per Minute")
fig.autofmt_xdate(bottom=0.2, rotation=90, ha='left')
plt.savefig(output_image_name)
The Python idiom would be to use a generator expression to filter the lines:
sys.stdout.writelines(line for line in sys.stdin if not line.startswith('-'))
Or in a processing context:
filtered = (line for line in sys.stdin if not line.startswith('-'))
for line in filtered:
# ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With