I am using pandas to analyse existing ssh sessions to different nodes, for that I have parsed the ssh daemon log and I have a DataFrame that contains the following columns:
Here's a part of the data:
In [375]: sessions[1:10]
Out[375]:
Node Session Start Finish
1 svg01 27321 2015-02-23 07:24:45 2015-02-23 07:50:57
2 svg02 14171 2015-02-23 10:25:08 2015-02-23 14:33:24
3 svg02 14273 2015-02-23 10:26:21 2015-02-23 14:36:19
4 svg01 14401 2015-02-23 10:28:16 2015-02-23 14:38:04
5 svg01 26408 2015-02-23 14:01:49 2015-02-23 18:38:25
6 svg03 13722 2015-02-23 18:24:39 2015-02-23 20:51:59
7 svg05 17637 2015-02-23 19:10:00 2015-02-23 19:10:20
I want to generate an additional column that has the number of established sessions in a given node at when a new connection is established.
Without taking into account the Node I can compute this using:
count_sessions = lambda t: sessions[(sessions.Start<t) & (sessions.Finish>t)].shape[0]
sessions['OpenSessions'] = sessions['Start'].map(count_sessions)
The problem is that I would also need to take into account the 'Node' column value but I do not know how to get it.
I could use the index of the element in the Series to get the node in the sessions DataFrame but I did not found any way to retrieve the index of the element passed to the map.
def count(df):
count_sessions = lambda t: df[(df.Start<t) & (df.Finish>t)].shape[0]
df['OpenSessions'] = df['Start'].map(count_sessions)
return df
print sessions.groupby('Node').apply(count)
The output is:
Node Session Start Finish OpenSessions
0 svg01 27321 2015-02-23 07:24:45 2015-02-23 07:50:57 0
1 svg02 14171 2015-02-23 10:25:08 2015-02-23 14:33:24 0
2 svg02 14273 2015-02-23 10:26:21 2015-02-23 14:36:19 1
3 svg01 14401 2015-02-23 10:28:16 2015-02-23 14:38:04 0
4 svg01 26408 2015-02-23 14:01:49 2015-02-23 18:38:25 1
5 svg03 13722 2015-02-23 18:24:39 2015-02-23 20:51:59 0
6 svg05 17637 2015-02-23 19:10:00 2015-02-23 19:10:20 0
Read this for inspiration.
Just a suggestion about another way to proceed: I am not sure about the criteria but you should be able to adapt this easily:
sessions['OpenSessions'] = sessions.apply(\
lambda row: len(sessions[(sessions['Start'] < row['Start']) &\
(sessions['Finish'] > row['Finish']) &\
(sessions['Node'] == row['Node'])]), axis = 1)
For each row (argument axis = 1), it simply counts the number of lines in your dataframe which match any criteria you want based on the row values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With