Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split access.log file by dates using command line tools

I have a Apache access.log file, which is around 35GB in size. Grepping through it is not an option any more, without waiting a great deal.

I wanted to split it in many small files, by using date as splitting criteria.

Date is in format [15/Oct/2011:12:02:02 +0000]. Any idea how could I do it using only bash scripting, standard text manipulation programs (grep, awk, sed, and likes), piping and redirection?

Input file name is access.log. I'd like output files to have format such as access.apache.15_Oct_2011.log (that would do the trick, although not nice when sorting.)

like image 757
mr.b Avatar asked Jul 27 '12 11:07

mr.b


People also ask

How do I manage log files in Linux?

Linux systems typically save their log files under /var/log directory. This works fine, but check if the application saves under a specific directory under /var/log . If it does, great. If not, you may want to create a dedicated directory for the app under /var/log .

How do you grep a log file within a specific time period in Linux?

Use the tail command to get the last 2-3 records as shown below. In the above log the date format is 20/Aug/2021:07:23:07 that is DD/MMM/YYYY:HH:MM:SS. Now here is the awk command to extract data for the last 2 minutes. In the above command, %d/%b/%Y:%H:%M:%S is the format specifier of your date column.

Which log file includes all messages related to the mail server?

/var/log/messages - This file has all the global system messages located inside, including the messages that are logged during system startup. Depending on how the syslog config file is sent up, there are several things that are logged in this file including mail, cron, daemon, kern, auth, etc.


1 Answers

One way using awk:

awk 'BEGIN {
    split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ", months, " ")
    for (a = 1; a <= 12; a++)
        m[months[a]] = sprintf("%02d", a)
}
{
    split($4,array,"[:/]")
    year = array[3]
    month = m[array[2]]

    print > FILENAME"-"year"_"month".txt"
}' incendiary.ws-2009

This will output files like:

incendiary.ws-2010-2010_04.txt
incendiary.ws-2010-2010_05.txt
incendiary.ws-2010-2010_06.txt
incendiary.ws-2010-2010_07.txt

Against a 150 MB log file, the answer by chepner took 70 seconds on an 3.4 GHz 8 Core Xeon E31270, while this method took 5 seconds.

Original inspiration: "How to split existing apache logfile by month?"

like image 160
Theodore R. Smith Avatar answered Oct 04 '22 01:10

Theodore R. Smith