Honeybrid: Hybrid Honeypot Framework

Description
Download
Documentation
Development

Description

Honeybrid is a network application built to:

Deploy and administrate honeynets,
Provide the hybrid functionality of combining low and high interaction honeypots.

The first functionality works through a Decision Engine that allow users to precisely filter incoming traffic based on multiple criteria, such as source IP or payload content, and a Control Engine that automatically limits outgoing traffic to prevent compromised honeypots from attacking external hosts.
The second functionality works using a Redirection Engine that transparently redirects live network sessions (TCP or UDP) from one primary destination host to a secondary destination host using a replay process.

For more information about honeypots and honeynets, please refer to Niels Provos' honeyd website or Lance Spitzner's paper.
For more information about hybrid honeypots, please refer to Michael Bailey's paper (PDF) or to the background section of the documentation.

Honeybrid has been sponsored by the Google Summer of Code 2009 and the Honeynet Project.

Honeybrid is hosted by SourceForge. If you want to discuss about this project, please refer to the project summary page, where you can also get support.

Download

The latest version of Honeybrid is beta-0.1.5. The latest commit has been done on 2013-05-16.

Download Honeybrid beta-0.1.5
Download previous versions or browse the GIT repository
Check out the source code using the command: git clone git://git.code.sf.net/p/honeybrid/git honeybrid-git

Documentation

This documentation provides some background to explain the motivation behind Honeybrid, and then details the installation and configuration steps. In the last section, a case study is given to illustrate how Honeybrid can be used to learn about malicious network activity.
A wiki is under construction to provide additional documentation: https://sourceforge.net/apps/trac/honeybrid.

Background

Honeypots are designed and deployed to collect attack datasets that help measuring and understanding network threats. They are usually divided into two categories according to the level of interaction they provide to attackers. First, we have low interaction honeypots that emulates network services and collect the beginning of attack processes. And then we have high interaction honeypots that are identical to production machines and collect more detailed information about attacks. These two types of honeypot offer complementary advantages and limitations. Mainly, low interaction honeypots such as honeyd are relatively easy to deploy and to administrate, but they collect only high level information about attacks. High interaction honeypots are a lot more expensive to configure and to maintain, but they give access to low level information about attacks.
The goal of honeybrid is to facilitate the deployment of honeypots by 1) offering a precise attack filtering functionality, and 2) combining the advantages of low and high interaction honeypots. As such, Honeybrid is a hybrid honeypot solution including a module-based firewall that can control both incoming and outgoing traffic. This control functionality allows network administrators to precisely define the type of traffic they want to collect and which resource to allocate for their data collection.

The figures below show how Honeybrid can be integrated to manage a network of honeypots:

Honeybrid is made of 4 component:

a Decision Engine: to decide which traffic is going to which honeypot,
a Redirection Engine: to decide if some traffic needs to be redirected for more detailed analysis (see the redirection mechanism (figure) to understand the replay process),
a Control Engine: to limit outgoing network traffic sent by potentially compromised honeypots,
a Log Engine: to keep a detailed log of the traffic processed.

This four components are articulated around the concept of target, which hold the specifications for a honeypot-based experiment. So everytime we want to run a new honeypot-based experiment, we have to think about the type of traffic we want to collect and how we want to collect it, i.e. with how much granularity and with how much control. Targets are made of four statement:

a filter rule that defines using the tcpdump syntax the exact type of traffic that should be handled by this target,
a frontend rule that defines which honeypot should first interact with incoming attack traffic and what are the criteria to accept this incoming traffic,
an optional backend rule that defines toward which honeypot should the traffic be redirected for a more detailed analysis, and what are the criteria to decide on the traffic to redirect,
an optional control rule that defines how to limit outgoing traffic initiated by honeypots.

The criteria to accept, redirect and control network traffic are based on module. A module is a function that can inspect a packet and decide if it meets a given criteria. Here are the modules developed so far:

mod_random() is a module to randomly accept packets. A probability is given as argument, for example mod_random(0.1) would accept 10% of processed packets.
mod_yesno() is a module to always accept or always reject packets.
mod_counter() is a module to accept packets after a given number of packets received. For example mod_counter(3) would return 'accept' if the processed connections is made of at least 3 packets.
mod_source() is a module to accept packets only from new IP addresses. A given IP address that tries to connect multiple times to the honeynet would be rejected after the second attempt.
mod_hash() is a module to accept packets that carry new original payload, which means payloads that have never been inspected before. This module works by computing a hash value for every payload inspected and keeping a database of known payload.
mod_control() is a module to rate limit network packets based on source IP address. Packets are rejected after a source IP sent more than a given number of packets in a given period of time.

Please refer to the module development documentation if you are interested in implementing a new module.

To summarize, Honeybrid has been designed to offer a flexible, scalable and secure solution to run honeypot experiments. For a detailed example about how honeybrid can be used to help deploying honeypots, please refer to the case study section.

Installation

Dependencies

Honeybrid is dependent on the following public domain packages:

        gcc
        make
        binutils
        flex
        yacc/bison
        pkg-config
        libpcap
        libdumbnet
        glib2.0 (2.32+)
        nfnetlink
        netfilter_queue (1.0.0+)
        netfilter_conntrack

The recommended distribution to run Honeybrid on is Debian Wheezy. Running the following command will take care of most of the dependencies:

sudo apt-get install build-essential pkg-config flex bison \
	libpcap-dev libdumbnet-dev libglib2.0-dev libssl-dev libtool \
	libnfnetlink-dev libnetfilter-conntrack-dev libmnl-dev

A recent version of the libnetfilter_queue library should be downloaded and compiled from source

git clone git://git.netfilter.org/libnetfilter_queue
cd libnetfilter_queue
./autogen.sh
./configure --prefix=/usr
make
sudo make install

The Dionaea XMPP module (mod_dionaea) requires these extra, optional packages: libgloox-dev, libxml2-dev and a recent version of LibXML++

Compile Honeybrid


wget http://sourceforge.net/projects/honeybrid/files/honeybrid/beta-0.1/honeybrid-0.1.5.tar.gz/download -O honeybrid-0.1.5.tar.gz
tar xzf honeybrid-0.1.5.tar.gz
cd honeybrid-0.1.5
./configure
make
(optionally) sudo make install

Configuration

honeybrid.sh is the Start/Stop script, installed by default in "/etc/init.d/honeybrid.sh". It also includes "add" and "del" option to automatically add IP addresses to the iptables queue,

honeybrid.conf is the configuration file for Honeybrid. It is installed by default in "/etc/honeybrid/honeybrid.conf". It include all parameters for honeybrid as well as definition of detection modules and honeynet targets.
Back to top

Honeybrid.conf

The configuration file of Honeybrid is made of 3 sections. A commented example of each section is given below:

General parameters

# The configuration of honeybrid is divided into three sections:
# - main:   where the main parameters of honeybrid are defined
# - module: where individual modules to process packets are defined and given a name
# - target: where individual targets are defined, to link modules with a traffic filter
#           expression
# It is important to have modules defined before targets!

## main configuration:
# Where the parameters of honeybrid are defined
config {
    ## output mode
    # 1 = syslog
    # 2 = stdout (do not daemonize)
    # 3 = log files
    # 4 = MySQL
        output = 3;

    ## pid directory
        exec_directory = /var/run/;

    ## log file directory
        log_directory = /var/log/honeybrid/;

    ## enable automatic hourly log rotation (applies only for connection logs, not for debug logs)
        log_rotation = 0;

    ## connection log file (log_directory defines the path)
        log_file = honeybrid.log;

    ## CSV log file format
        #log_format=csv;

    ## MySQL server parameters for logging
        #mysql_host = localhost;
        #mysql_user = my_user;
        #mysql_password = my_password;
        #mysql_db = my_db;
        #mysql_port = 3306;

    ## enable debug logfile?
        debug = 0;
    ## debug log file (detailed internal process, log_directory defines the path)
    #    debug_file = honeybrid.debug;

    ## Number of seconds after which network sessions are expired
        expiration_delay = 120;

    ## 'yes' to send reset to external host when there is an issue, 'no' to remain silent
        reset_ext = no;

    ## Dionaea XMPP listener parameters
        #xmpp_listener = /opt/honeybrid/xmpp_listener;
        #xmpp_fifo = /opt/honeybrid/xmpp.pipe;
        #xmpp_user = honeybrid;
        #xmpp_server = prosody-server;
        #xmpp_password = my_password;
        #xmpp_channel = my_channel;
        #xmpp_domain = my.domain;
        #xmpp_debug = /opt/honeybrid/xmpp.debug;
        #xmpp_db = /opt/honeybrid/xmpp.db;
}

Module definition

# Each section 'module' requires a name (between quotes)
# and a parameter 'function' that defines the type of module that will be used to process packets
# Other parameters are required depending on the type of function used
# 
module "random" {
	function = random;
	# The module random needs a single parameter 'value'. 
	# A value of 10 means that 1 out of 10 packets will be randomly accepted
	value = 10;
}

module "hash" {
        function = hash;
	# The module hash needs a single parameter 'backup', 
	# to know where it should save the database of payload hashes
        backup = /etc/honeybrid/hash.db;
}

module "counter" {
        function = counter;
	# The module counter needs a single parameter 'counter', 
	# that defines after how many packets should it returns 'accept'
        counter = 2;
}

module "control" {
        function = control;
	# The module control needs 3 parameters:
	# 'backup' to define where the database of controlled IP addresses should be saved
        backup = /etc/honeybrid/control.db;
	# 'expiration' to know after how many seconds should IP be removed from the database
        expiration = 600;
	# 'max_packet', to give the maximum number of accepted packets sent during 'expiration' before	
	# rejecting packets.
        max_packet = 1000;
}

module "source" {
        function = source;
	# The module source needs 2 parameters:
	# 'backup' to define where the database of know IP addresses should be saved
        backup = /etc/honeybrid/source.db;
	# 'expiration' to know after how many seconds should IP be removed from the database
	expiration = 600;
}

Target definition

# Each section 'target' requires 4 parameters:
#  'filter'   (mandatory) to define the type of traffic covered by this target (using the tcpdump syntax)
#  'frontend' (mandatory) to define a rule to accept traffic (using a boolean equation of modules)
#  'backend'  (optional)  to define a rule to redirect traffic (using a boolean equation of modules)
#  'control'  (optional)  to define a rule to control outgoing traffic iniated by honeypots (using a boolean equation of modules)
# The frontend and backend parameters also requires the IP address of the honeypot in charge of the frontend or backend respectively.
# The tcpdump filter and the boolean equations require quotes.

target {
        filter "dst host 192.168.0.10 and port 80";
        frontend 192.168.0.10 "source or random";
        backend 192.168.0.11 "hash";
        control "control";
}

Case Study

Here is an example of honeynet deployed using Honeybrid to collect web attacks targeting the service tcp/80. The specifications for this experiment are:

3 Linux machines available:
- a low interaction honeypot (10.0.0.2) running Honeyd (with a script configured to emulate Apache),
- a high interaction honeypot (10.0.0.3) running Apache,
- a gateway running Honeybrid
A /21 subnet available to collect traffic (2,048 IP addresses).

The strategy to collect statistics about web attacks will be to redirect all web traffic targeting our /21 subnet to the low interaction honeypot using a NAT firewall rule. We will use the source module to filter incoming traffic by attacker, and the hash module to handle new attack requests using the high interaction web server.

Configuration

To run such experiment, we would start by configuring the following target in honeybrid.conf:

target {
	filter "dst net 10.0.0.2 and proto tcp and port 80";
	frontend 10.0.0.2 "source or random";
	backend  10.0.0.3 "hash or counter";
	control "control";
}

The filter rule tells Honeybrid to accept only TCP packets towards the low interaction honeypot (10.0.0.2) on port 80.
The frontend rule defines that only the first attempt of each attacker should be accepted. Other attempts will be discarded, except if the random sampling module returns 'accept'. This rule makes sure that an attacker won't be able to flood our honeynet with too many requests. The advantage is to reduce the volume of traffic recorded by our honeynet. The disadvantage might be to lose some attack requests.
The backend rule states that if a new payload is sent by an attacker, the connection should be redirected toward the high interaction honeypot for further analysis. A module 'counter' is also used to start redirecting if the connection reaches a given number of packet.
Finally the control rule applies the module 'control' to limit connections initiated by honeypots, in case an attacker successfully compromise our machines.

We would then configure our five modules as described in the module section of the documentation of honeybrid.conf above.

The next step would be to configure the Netfilter firewall on the gateway so that Honeybrid can receive attack and honeypot traffic. This would be achieved using the following commands:

iptables -A PREROUTING -d 10.0.0.0/21 -j DNAT --to-destination 10.0.0.2 -m comment --comment "NAT rule for attack traffic to be sent to honeyd"
iptables -I FORWARD -d 10.0.0.2 -j QUEUE -m comment --comment "Incoming attack traffic"
iptables -I FORWARD -s 10.0.0.2 -j QUEUE -m comment --comment "Outgoing traffic from honeyd"
iptables -I FORWARD -s 10.0.0.3 -j QUEUE -m comment --comment "Outgoing traffic from high interaction honeypot"

It is important to notice that Honeybrid currently works only with the FORWARD chain, which means that setting up the QUEUE on INPUT or OUTPUT chains will not work.

The final step would be to start Honeybrid:

/etc/init.d/honeybrid start

Data collection

The attack data collection would be reported in the main log file of honeybrid, as well as in the module backup files.

The main log file of Honeybrid output one flow per line. The path for this log file is by default /var/log/honeybrid/honeybrid.log. Here is a sample output of this file:

2009-08-13 12:24:14.507182 1.069 TCP 75.54.75.79:3640 -> 10.0.0.2:80 1 76 INIT
2009-08-13 12:24:17.802673 0.000 UDP 190.24.167.119:65208 -> 10.0.0.2:53 1 55 DROPPED
2009-08-13 12:24:17.418888 0.000 UDP 190.72.164.176:17604 -> 10.0.0.2:53 1 57 DROPPED
2009-08-13 12:24:19.297894 1.181 TCP 80.26.77.71:2220 -> 10.0.0.2:80 6 264 FORWARD
2009-08-13 12:24:19.696378 1.388 TCP 201.92.102.117:2532 -> 10.0.0.2:80 6 264 FORWARD
2009-08-13 12:24:26.177629 0.000 UDP 189.71.12.30:1171 -> 10.0.0.2:53 1 59 DROPPED
2009-08-13 12:24:52.764639 4.047 UDP 93.96.65.14:52771 -> 10.0.0.2:3038 2 138 DROPPED
2009-08-13 12:24:54.829422 3.999 UDP 93.96.65.14:52774 -> 10.0.0.2:2097 2 134 DROPPED
2009-08-13 12:24:56.600800 3.044 TCP 201.8.253.38:60610 -> 10.0.0.2:80 2 156 INIT
2009-08-13 12:25:00.018360 0.000 UDP 64.244.103.70:43723 -> 10.0.0.2:53 1 116 DROPPED

This output is made of 8 fields:

timestamp: 2009-08-13 12:24:14.507182
duration in second: 1.069
protocol: TCP
source IP and source port: 75.54.75.79:3640
destination IP and destination port: 10.0.0.2:80
number of packets: 1
number of bytes: 76
status of the connection after being expired by timeout (default: 120 seconds of inactivity): INIT

The last field status indicates how Honeybrid processed the packet. There are 5 possible statuses:

INIT: initialization status applied to all new network flows,
DECISION: when the network flow reached the Decision Engine,
REPLAY: when Honeybrid decides to redirect the flow and starts the replay process,
FORWARD: when the replay process has been completed and packets are forwarded between the attacker and the backend,
PROXY: when there is no backend defined, but packets have been accepted so they are forwarded between the attacker and the frontend,
DROPPED: when the flow has been dropped by Honeybrid (based on the output of the decision modules),
CONTROL: when the flow has been iniated by a honeypot and is being controlled by the Control Engine.

Please refer to the redirection diagram (figure) to better undertand the first 4 states.

Here is a sample output of the backup file for the hash module:

[80]
d20f64acc6e70b6079845f2fe357732929550ae1=1      1249600127      0       2       7       GET /. 
bba48dc680c4c8307b417eab5d5c34c0e8cb8ca5=2      1249941662      1       2       46      GET /w00tw00t.at.ISC.SANS.DFind
67a7422786a0717dd3aa08c91468ae54812563e6=1      1249997593      0       2       157     GET / HTTP/1.0..Host
686992e5ab00430d7d5add1f64cbec39b681c353=1      1250000907      0       2       98      GET /IDS5web/bin/lang HTTP/1.1..User-Agent
ae28f258147bb6b4ea02e980c8aa8bb0f28f27fc=1      1250000907      0       2       93      GET /IDS5web/bin HTTP/1.1..User-Agent
739b5df5c2c38de0609faa3e6087a85a7ee33d3c=1      1250002021      0       2       180     GET HTTP/1.1 HTTP/1.1..Accept
06f26836f7cbd70f280286b2a254e90ebe0e6ec0=1      1250002024      0       2       201     GET /misc/mantis/login_page.php HTTP/1.1..Accept
4a071f1219b0ea8594ab59763f6c9a00b942cf04=1      1250002024      0       2       196     GET /mantis/login_page.php HTTP/1.1..Accept
dc7b235a91e26895aec6a8354197dba86a994020=1      1250002024      0       2       202     GET /tools/mantis/login_page.php HTTP/1.1..Accept
12f0345fa353b6b181c45896290cb38d74a80431=1      1250002024      0       2       204     GET /support/mantis/login_page.php HTTP/1.1..Accept
adefddcaf925e11f4b2c4f3db42063ef9bcc2d6b=1      1250002024      0       2       200     GET /php/mantis/login_page.php HTTP/1.1..Accept
3ec1ec08a0c94ffe4c24632c041b73edcdde845f=1      1250002024      0       2       202     GET /turbo/mantis/login_page.php HTTP/1.1..Accept
7c454a859b585745fbb8aa42da84f4bf64548d50=1      1250002024      0       2       198     GET /mantisbt/login_page.php HTTP/1.1..Accept

Hashes are grouped per port, so the above example is for port [80]. This log files is made of 6 fields:

the payload hash: d20f64acc6e70b6079845f2fe357732929550ae1=1
the timestamp when this hash was first recorded: 1249600127
the duration in second during which this hash was sent by attackers: 0
the packet index in the flow during which this payload was collected: 2
the number of bytes of the payload: 7
the 64 first characters of ASCII representation of the payload: GET /.

Development

Developers

The development of Honeybrid was initiated in 2007 by Robin Berthier as part of his PhD research. The main developer of Honeybrid in 2007 was Julien Vehent. Julien designed and implemented the architecture based on a Decision Engine and a Redirection Engine. In 2008, Thomas Coquelin took over the development of the project. Thomas consolidated the prototype of Honeybrid and built a robust redirection mechanism. In 2009, Honeybrid was sponsored by the Honeynet Project and the Google Summer of Code to be turned into an open source application. In 2012 Tamas K Lengyel expanded Honeybrid to support additional GRE tunnels and clone routing when used in conjuction with VMI-Honeymon as part of his PhD research at the University of Connecticut. Honeybrid was a central part of the research papers Virtual Machine Introspection in a Hybrid Honeypot Architecture and Towards Hybrid Honeynets via Virtual Machine Introspection and Cloning.

If you would like to join this project and contribute to Honeybrid, please contact the project administrators, as shown on the project summary page.

Adding a module

The decision engine of Honeybrid works with module. Here is an example based on the module mod_random() to illustrate how to implement a new module:

Create a file: mod_random.c, in which you can put the following functions:
- int init_mod_random() [** this is optional, only if you need some initialization **]
- void mod_random(struct mod_args args)
mod_random.c must have the following include:
- #include "modules.h"
Modify the Makefile.am to:
- add mod_random.c to the target SRC
Add the following function declarations in modules.h:
- int init_mod_random(); [** this is optional, only if you need some initialization **]
- void mod_random(struct mod_args *args);
Add the following call in the function init_modules() in modules.c:
- init_mod_random(); [** this is optional, only if you need some initialization **]
Add the following to the module enumeration at the top of modules.c:
- MOD_RANDOM,
Add the following to the module definition array at the top of modules.c:
- [MOD_RANDOM] = {.name = "random", .function = mod_random},

Now that you completed these steps, your module is defined and hooked to the system. The last task is to get it do something! For this you just have to fill the function mod_random() with instructions.
The args structure given in argument of mod_random() has two main variables:

args->pkt is a struct_pkt where you can extract args.pkt->conn to have access to the connection structure
args->node has two interesting variables:
- args->node->arg is the argument configured for this module and this connection in the rules of honeybrid
- args->node->result is an integer that must be updated at the end of the processing in mod_random(), either with 0 (discard) or 1 (replay)

To illustrate how everything works together, here is the content of mod_random():

void mod_random(struct mod_args *args)
{
    g_printerr("%s Module called\n", H(args->pkt->conn->id));

    unsigned int probe;
    uint32_t selector = 1;
    const uint32_t *value;

    /*! getting the value provided as parameter */
    if ((value = (const uint32_t *)g_hash_table_lookup(args->node->arg, "value")) == NULL)
    {
        /*! We can't decide */
        args->node->result = -1;
        g_printerr("%s Incorrect value parameter: %d\n", H(args->pkt->conn->id),
                *value);
        return;
    }

    if (*value < selector)
    {
        /*! We can't decide */
        args->node->result = -1;
        g_printerr("%s Incorrect value parameter: %d\n", H(args->pkt->conn->id),
                *value);
        return;
    }

    /*! deciding based on a probability of 1 out of "value": */
    probe = (int) (((double) *value) * (rand() / (RAND_MAX + 1.0)));

    if (probe == selector)
    {
        /*! We accept this packet */
        args->node->result = 1;
        g_printerr("%s PACKET MATCH RULE for random(%d)\n",
                H(args->pkt->conn->id), *value);
    }
    else
    {
        /*! We reject this packet */
        args->node->result = 0;
        g_printerr("%s PACKET DOES NOT MATCH RULE for random(%d)\n",
                H(args->pkt->conn->id), *value);
    }
}

We can see from the code that mod_random() uses the argument "value". This means that when defining this module in the configuration of honeybrid, user should write the value parameter. Here is an example:

 module "myrandom" {
        function = random;
        value = 20;
 }

Another interesting parameter that can be defined is "backup". It is used by more complex modules that need to save results to an external file periodically (and that can load previously recorded results when Honeybrid starts). Here is an example of module definition for the "hash" module that uses such backup functionality:

 module "myhash" {
        function = hash;
        backup = /etc/honeybrid/hash.tb;
 }

When using the backup parameter, the following function should be called at the end of the module processing function:

 save_backup(backup, backup_file);

Where backup is retrived through the "backup" parameter (it's a pointer to a GKeyFile), and backup_file is a string to give the path and filename of the external file where results should be saved. Here is an excerpt from the source code of mod_hash.c that shows how to retrieve these two parameters:

        /*! get the backup file for this module */
        if ( NULL ==    (backup = (GKeyFile *)g_hash_table_lookup(args.node->arg, "backup"))) {
                /*! We can't decide */
                args.node->result = -1;
                g_printerr("%s mandatory argument 'backup' undefined!\n", H(args.pkt->conn->id));
                return;
        }
        /*! get the backup file path for this module */
        if ( NULL ==    (backup_file = (gchar *)g_hash_table_lookup(args.node->arg, "backup_file"))) {
                /*! We can't decide */
                args.node->result = -1;
                g_printerr("%s error, backup file path missing\n", H(args.pkt->conn->id));
                return;
        }

Please note that the parameter "backup_file" is created automatically and should not be added in the configuration file of Honeybrid.

To do list

Implement module mod_load() to accept incoming packets only if the honeynet is not saturated
Implement module mod_sample(), similar to mod_random() but using a fixed counter to decide to accept packets
Implement module mod_snort(), to use the real time output of snort to decide to accept or reject packets
Restore support for automated PCAP saving of the redirected network traffic
Export connection logs using Argus format or Netflow v9 format or IPFIX format
Implemented multi-uplink support when using clone-routing.

Project Web Hosted by

and last updated on 2013-05-16.