Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Application of neural network for use with log file data

I’ve been following Andrew NG’s coursera AI course , specifically the section on neural networks and I’m planning on implementing a neural network on log file data.

My log file contains data of this type :

<IP OF MACHINE INITIATING REQUEST><DATE OF REQUEST><TIME OF REQUEST><NAME OF RESOUCE BEING ACCESSED ON SERVER><RESPONSE CODE><TIME TAKEN FOR SERVER TO SERVE PAGE>

I’m aware there are other classification algorithms that could be used for this task such as naïve bayes and local outlier factor but want to gain exposure with neural networks using a real world applicable problem.

I have read about neural network self-organizing maps and this seems to be more suited to this type of problem as the log file does not have any structure, but seems to be a more advanced topic.

Instead of using a self-organizing map neural network I plan to create the training data from log file data by grouping the data into a key value pair where the key is the <IP OF MACHINE INITIATING REQUEST> and the value for each key is [<NAME OF RESOUCE BEING ACCESSED ON SERVER>, ><TIME TAKEN FOR SERVER TO SERVE PAGE>]

From above log file data I’m aiming to use a neural network(s) :

To classify similar IP behaviors based on what resources are being accessed. 
Classify behavior at specific periods / moments in time, so what IP’s are behaving similarly and specific moment in time. 

I’m not sure where to start with above. I’ve implemented very basic neural networks that perform integer arithmetic but now want to implement a network that is of use based on the data I have.

Based on log data format is this a good use case ?

Any pointers on where to being with this task ?

I hope this question is not too generic , I'm just unsure what questions to consider when beginning implementation of a neural network.

Update :

I would like to output data that is best suited to be generated from a neural network.

For this I think outputting a classification of users based over periods of time based on similarity score.

To generate the similarity score I could generate a count of times each IP address accesses a resource :

e.g :

1.2.3.A,4,3,1
1.2.3.B,0,1,2
1.2.3.C,3,7,3

from this then generate :

<HOUR OF DAY>,<IP ADDRESS X>,<IP ADDRESS Y>,<SIMMILARITY SCORE>

:

1,1.2.3.A,1.2.3.B,.3
1,1.2.3.C,1.2.3.B,.2
1,1.2.3.B,1.2.3.B,0
2,1.2.3.D,1.2.3.B,.764
2,1.2.3.E,1.2.3.B,.332
3,1.2.3.F,1.2.3.B,.631

So then can begin to correlate how users behave over course of day.

Is applicable to neural network?

I realise I'm asking about a neural network looking for a problem, but is this a suitable problem ?

like image 335
blue-sky Avatar asked Nov 02 '15 21:11

blue-sky


1 Answers

Based on log data format is this a good use case ?

You can use it as a dataset to train a neural network to predict future values or classify them in labels (or categories). For some types of neural network (specially, Multi-Layer Perceptron) it depends how you organize your dataset to use during the training of neural network. There are other cases you can group the sample (also known as clustering).

Neural Network Concept

Since you have a historical data separated in fields (or properties), you can create a model of neural network to classify or predict possible future values.

Given a neural network is a mathematical model that is defined by training steps, you have to define input and output sets to use during the training to define this model (neural network). Given that, your qualitative values (texts, chars, letters, etc) have to be converted as quantitative values, for sample:

A you convert to 1
B you convert to 2
C you convert to 3
...
Z you convert to N

After this, you can arrange your dataset in samples in order to separate it in an input list and the ideal output for each sample. For sample, let's suppose you have a dataset that define houses in the real estate market and their prices. You have the task to define a price (suggest) for new future houses, a sample of your training set could be like this:

Input:

Bedrooms ; Bathrooms ; Garage ; Near Subway
1        ; 1         ; 0      ; 1
3        ; 2         ; 2      ; 1
2        ; 2         ; 1      ; 0

Ideal Output (for each sample of input respectively)

Price
100.000
150.000
230.000

And use these sets to train a neural network to suggest a price for a future house providing the characteristics

Your problem

In your case, the IPs fields, could be converted to quantitative values. For sample:

1.2.3 convert to 1
1.2.4 convert to 2
1.2.5 convert to 3

Let's suppose you want to classify the SIMILARITY SCORE field, so, you can use the the columns HOUR OF DAY, IP ADDRESS X and IP ADDRESS Y as input set and the output set you have just SIMILARITY SCORE. The image bellow draws how to led with it (a simple feed-forward neural network).

enter image description here

There are many tools that allows you to work easily with neural networks, you can use arrays of double values to define these sets and the object will be trained for you. I have been using the Encog Framework from Heaton Research and it support Java, C#, C++ and other. There is also another one called Accord Framework but it is just for .Net.

A very sample of how to implement a Feed-forward Neural Network using Encog for Java:

BasicNetwork network = new BasicNetwork();

// add layers in the neural network
network.addLayer(new BasicLayer(null, true, 3));
network.addLayer(new BasicLayer(new ActivationTANH(), true, 4));
network.addLayer(new BasicLayer(new ActivationTANH(), true, 1));

// finalize and randomize the neural network
network.getStructure().finalizeStructure();
network.reset();

// define a random training set.
// You can define using your double arrays here
MLDataSet training = RandomTrainingFactory.generate(1000, 5, network.getInputCount(), network.getOutputCount(), -1, 1);

ResilientPropagation train = new ResilientPropagation(network, training);
double error = 0;
Integer epochs = 0;

//starting training
do 
{
    //train
    train.iteration();

    //count how many iterations the loop has
    epochs++;

    // get the error of neural network in the training set
    error = train.getError();

// condition for stop training
} while (epochs < 1000 && error > 0.01);

Obs: I did not test this code.

If you are starting with neural networks, I recommend you to implement your model and try it out using the datasets from UCI Machine Learning Repository. There are too many datasets for classification, regression and clustering problems you can test your implementation.

like image 180
Felipe Oriani Avatar answered Sep 22 '22 01:09

Felipe Oriani