Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reconstruct TCP stream from multiple IP packets?

I am working on a TUN-based VPN server whose goal is to analyze packets it receives before forwarding them to their destination. Currently I am receiving the IP packets from a TUN interface, and simply sending them off to their destination unmodified.

I understand that analyzing the content of UDP packets would be as simple as stripping the IP and UDP headers. However, to analyze the contents of TCP traffic, I would need to reconstruct the message from multiple IP packets. Is there an easy way to do this without re-implementing TCP? Are there any easily accessible C/C++ libraries meant for this task? I would prefer Linux system libraries and/or open-source, non-viral/non-copyleft libraries.

One thing I have already considered is making a copy of each IP packet, and changing the destination IP of the copy to localhost, so that a different part of my server may receive these TCP requests and responses fully reconstructed and without headers. However, I would not be able to associate destination IPs with traffic content, which is something that I desire.

like image 587
Jomasi Avatar asked Sep 09 '13 23:09

Jomasi


2 Answers

It is likely functionality you need will be always tightly coupled with packet dissection. Good protocol dissectors are really needed to extract required information. So my suggestion is to use best open source tool available - wireshark.org

It provides "Follow TCP stream" functionality:

enter image description here

I doesn't look like you can easily extract part of Wireshark dissection logic, but at least there is a good example packet-tcp:

typedef struct _tcp_flow_t {
    guint32 base_seq;   /* base seq number (used by relative sequence numbers)
                 * or 0 if not yet known.
                 */
    tcp_unacked_t *segments;
    guint32 fin;        /* frame number of the final FIN */
    guint32 lastack;    /* last seen ack */
    nstime_t lastacktime;   /* Time of the last ack packet */
    guint32 lastnondupack;  /* frame number of last seen non dupack */
    guint32 dupacknum;  /* dupack number */
    guint32 nextseq;    /* highest seen nextseq */
    guint32 maxseqtobeacked;/* highest seen continuous seq number (without hole in the stream) from the fwd party,
                 * this is the maximum seq number that can be acked by the rev party in normal case.
                 * If the rev party sends an ACK beyond this seq number it indicates TCP_A_ACK_LOST_PACKET contition */
    guint32 nextseqframe;   /* frame number for segment with highest
                 * sequence number
                 */

Basically, there is separate conversation extraction logic, please notice find_conversation usage:

/* Attach process info to a flow */
/* XXX - We depend on the TCP dissector finding the conversation first */
void
add_tcp_process_info(guint32 frame_num, address *local_addr, address *remote_addr, guint16 local_port, guint16 remote_port, guint32 uid, guint32 pid, gchar *username, gchar *command) {
    conversation_t *conv;
    struct tcp_analysis *tcpd;
    tcp_flow_t *flow = NULL;

    conv = find_conversation(frame_num, local_addr, remote_addr, PT_TCP, local_port, remote_port, 0);
    if (!conv) {
        return;
    }

The actual logic is well documented and available here:

/*
 * Given two address/port pairs for a packet, search for a conversation
 * containing packets between those address/port pairs.  Returns NULL if
 * not found.
 *
 * We try to find the most exact match that we can, and then proceed to
 * try wildcard matches on the "addr_b" and/or "port_b" argument if a more
 * exact match failed.
 * ...
 */
conversation_t *
find_conversation(const guint32 frame_num, const address *addr_a, const address *addr_b, const port_type ptype,
    const guint32 port_a, const guint32 port_b, const guint options)
{
   conversation_t *conversation;

   /*
    * First try an exact match, if we have two addresses and ports.
    */
   if (!(options & (NO_ADDR_B|NO_PORT_B))) {

So what I'm actually suggesting is to use EPAN library. It is possible to extract this library and use it independently. Please be careful with the license.

like image 71
Renat Gilmanov Avatar answered Sep 27 '22 19:09

Renat Gilmanov


Maybe you might be interested in libipq - iptables userspace packet queuing library.

#include <linux/netfilter.h>
#include <libipq.h>

Netfilter provides a mechanism for passing packets out of the stack for queueing to userspace, then receiving these packets back into the kernel with a verdict specifying what to do with the packets (such as ACCEPT or DROP). These packets may also be modified in userspace prior to reinjection back into the kernel. For each supported protocol, a kernel module called a queue handler may register with Netfilter to perform the mechanics of passing packets to and from userspace.

The standard queue handler for IPv4 is ip_queue. It is provided as an experimental module with 2.4 kernels, and uses a Netlink socket for kernel/userspace communication.

Once ip_queue is loaded, IP packets may be selected with iptables and queued for userspace processing via the QUEUE target

here is brief example how to decompose tcp/ip packet:

ipq_packet_msg_t *m = ipq_get_packet(buf);

struct iphdr *ip = (struct iphdr*) m->payload;

struct tcphdr *tcp = (struct tcphdr*) (m->payload + (4 * ip->ihl));

int port = htons(tcp->dest);        

status = ipq_set_verdict(h, m->packet_id,
                          NF_ACCEPT, 0, NULL);
if (status < 0)
        die(h);

quick intro

If this is not what you are looking for you might try to use wireshark EPAN library.

like image 39
4pie0 Avatar answered Sep 27 '22 18:09

4pie0