Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange Python memory usage with Scapy

I wrote a scripts that logs mac addresses from pcapy into mysql through SQLAlchemy, I initially used straight sqlite3 but soon realized that something better was required, so this weekend that past I rewrote all the database talk to comply with SQLAlchemy. All works fine, data goes in and comes out again. I though the sessionmaker() would be very useful to manage all the sessions to the DB for me.

I see a strange occurrence with regards to memory consumption. I start the script... it collects and writes all to DB... but for every 2-4seconds I have a Megabyte in size increase in memory consumption. At the moment I'm talking about very few records, sub-100 rows.

Script Sequence:

  1. Script Starts
  2. SQLAlchemy reads mac_addr column into maclist[].
  3. scapy gets packet > if new_mac is in maclist[]?

if true? only write timestamp to timestamp column where mac = newmac. back to Step 2.

if false? then write new mac to DB. clear maclist[] and call step 2 again.

After 1h30m I have a memory footprint of 1027MB (RES) and 1198MB (VIRT) with 124 rows in the 1 table database (MySQL).

Q: Could this be contributed to the maclist[] being cleaned and repopulated from DB everytime?

Q: Whats going to happen when it reaches system Max memory?

Any ideas or advice would be great thanks.

memory_profiler output for the segment in question where list[] gets populated from database mac_addr column.

Line #    Mem usage    Increment   Line Contents
================================================
   123 1025.434 MiB    0.000 MiB   @profile
   124                             def sniffmgmt(p):
   125                              global __mac_reel
   126                              global _blacklist
   127 1025.434 MiB    0.000 MiB    stamgmtstypes = (0, 2, 4)
   128 1025.434 MiB    0.000 MiB    tmplist = []
   129 1025.434 MiB    0.000 MiB    matching = []
   130 1025.434 MiB    0.000 MiB    observedclients = []
   131 1025.434 MiB    0.000 MiB    tmplist = populate_observed_list()
   132 1025.477 MiB    0.043 MiB    for i in tmplist:
   133 1025.477 MiB    0.000 MiB          observedclients.append(i[0])
   134 1025.477 MiB    0.000 MiB    _mac_address = str(p.addr2)
   135 1025.477 MiB    0.000 MiB    if p.haslayer(Dot11):
   136 1025.477 MiB    0.000 MiB        if p.type == 0 and p.subtype in stamgmtstypes:
   137 1024.309 MiB   -1.168 MiB            _timestamp = atimer()
   138 1024.309 MiB    0.000 MiB            if p.info == "":
   139 1021.520 MiB   -2.789 MiB                        _SSID = "hidden"
   140                                          else:
   141 1024.309 MiB    2.789 MiB                        _SSID = p.info
   142                                      
   143 1024.309 MiB    0.000 MiB            if p.addr2 not in observedclients:
   144 1018.184 MiB   -6.125 MiB                    db_add(_mac_address, _timestamp, _SSID)
   145 1018.184 MiB    0.000 MiB                    greetings()
   146                                      else:
   147 1024.309 MiB    6.125 MiB                add_time(_mac_address, _timestamp)
   148 1024.309 MiB    0.000 MiB                observedclients = [] #clear the list
   149 1024.309 MiB    0.000 MiB                observedclients = populate_observed_list() #repopulate the list
   150 1024.309 MiB    0.000 MiB                greetings()

You will see observedclients is the list in question.

like image 206
Dusty Boshoff Avatar asked Jun 29 '15 13:06

Dusty Boshoff


2 Answers

I managed to find the actual cause to the memory consumption. It was scapy itself. Scapy by default is set to store all packets it captures. But you can disable it.

Disable:

sniff(iface=interface, prn=sniffmgmt, store=0)

Enable:

sniff(iface=interface, prn=sniffmgmt, store=1)

Thanks to BitBucket Ticket

like image 193
Dusty Boshoff Avatar answered Oct 04 '22 07:10

Dusty Boshoff


As you can see profiler output suggests you use less memory by the end, so this is not representative of your situation.

Some directions to dig deeper: 1) add_time (why is it increasing memory usage?) 2) db_add (why is it decreasing memory usage? caching? closing/opening db connection? what happens in case of failure?) 3) populate_observed_list (is return value safe for garbage collection? may be there are some packets for which certain exception occurs?)

Also, what happens if you sniff more packets than your code is able to process do to performance?

I would profile these 3 functions and analyze possible exceptions/failures.

like image 30
Eriks Dobelis Avatar answered Oct 04 '22 06:10

Eriks Dobelis