Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database logs vs file logs

I have created a PHP+MYSQL web app and I am trying to implement now a logging system to store and track some actions of each user.

The purpose of this is the following: track the activity of each user's session by logging the IP+time+action, then see which pages he accessed later on by logging time+pagename; for each user there will be a file in the format: log{userid}_{month}.log

Each log will then be viewed only by the website owner, through a custom admin panel, and the data will be used only for security purposes (as in to show to the user if he logged in from a different IP or if someone else logged in from a different IP and to see which areas of the website the user accessed during his login session).

Currently, I have a MYSQL MyISAM table where I store the userid,IP,time,action and the app is still not launched, but we intend to have very many users (over 100k), and using a database for this solutions feels like suicide.

So what do you suggest? How should the logging be done? Using files, using a table in the current database, using a separate database? Are there any file-logging frameworks available for PHP?

How should the reading of the file be done then? Read the results by row?

Thank you

like image 683
NVG Avatar asked Feb 08 '15 14:02

NVG


People also ask

Is it better to log to file or database?

4 Answers. Show activity on this post. In very general terms, logging to a text file is much faster than logging to a database. That's the main aspect of logging you need to consider.

What is the difference between the database log file and the database data file?

Data files contain data and objects such as tables, indexes, stored procedures, and views. Log files contain the information that is required to recover all transactions in the database.

What is a database log file?

A log file is a computer-generated data file that contains information about usage patterns, activities, and operations within an operating system, application, server or another device, and is the primary data source for network observability.


2 Answers

You have many options, so I'll speak from my experience running a startup with about 500k users, 100k active every month, which seems to be in your range.

We logged user actions in a MySQL database.

  1. Querying your data is very easy and fast (provided good indexes)
  2. We ran on Azure, and had a dedicated MySQL (with slaves, etc) for storing all user data, including logs. Space was not an issue.
  3. Logging to MySQL can be slow, depending on everything you are logging, so we just pushed a log to Redis and had a Python app read it from Redis and insert into MySQL in the background. This made that logging basically had no impact on loading times.

We decided to log in MySQL for user actions because:

  1. We wanted to run queries on anything at any time without much effort. The structured format of the user action logs made that incredibly easy to do.
  2. It also allows you to display certain logs to users, if you would require it.
  3. When we introduced badges, we had no need to parse text logs to award badges to those who performed a specific action X number of times. We simply wrote a query against the user action logs, and the badges were awarded. So adding features based on actions was easy as well.

We did use file logging for a couple of application logs - or things we did not query on a daily basis - such as the Python app writing to the database, Webserver access and error logs, etc.

We used Logstash to process those logs. It can simply hook into a log file and stream it to your Logstash server. Logstash can also query your logs, which is pretty cool.

Advanced uses

We used Slack for team communications and integrated the Python database writing app with it, this allowed us to send critical errors to a channel (via their API) where someone could action a fix immediately.

Closing

My suggestion would be to not over think it for now, log to MySQL, query and see the stats. Make updates, rinse and repeat. You want to keep the cycle between deploy and update quick, so making decisions from a quick SQL query makes it easy.

Basically what you want to avoid is logging into a server, finding a log and grep your way through it to find something, the above achieved that.

This is what we did, it is still running like that and we have no plans to change it soon. We haven't had any issues where we could not find anything that we needed. If there is a massive burst of users and we scale to 1mil monthly active users, then we might change it.

Please note: whichever way you decide to log, if you are saving the POST data, be sure to never do that for credit card info, unless you are compliant. Or rather use Stripe's JavaScript libraries.

like image 110
Donovan Solms Avatar answered Oct 15 '22 01:10

Donovan Solms


If you are sure that reading the log will mainly target one user at a time, you should consider partioning your log table: http://dev.mysql.com/doc/refman/5.1/en/partitioning-range.html using your user_id as partitioning key.

Maximum number of partitions being 1024, you will have one partition storing 1/1000 of your 100k users, which is something reasonable.

like image 21
Adam Avatar answered Oct 15 '22 01:10

Adam