Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

error reading saved cache and system table while starting Cassandra

Tags:

cassandra

I've been coming across the following exception when running Cassandra Daemon. I'm running from 1.2 trunk.

WARN 14:47:51,038 error reading saved cache /home/manuzhang/cassandra/saved_caches/system-local-KeyCache-b.db
java.lang.NullPointerException
    at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:141)
    at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:237)
    at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:340)
    at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:312)
    at org.apache.cassandra.db.Table.initCf(Table.java:332)
    at org.apache.cassandra.db.Table.<init>(Table.java:265)
    at org.apache.cassandra.db.Table.open(Table.java:110)
    at org.apache.cassandra.db.Table.open(Table.java:88)
    at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:284)
    at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:168)
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:318)
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:361)

here's where caches are saved:

manuzhang@manuzhang-U24E:~/cassandra/saved_caches$ ls -l
total 12
-rw-rw-r-- 1 manuzhang manuzhang 156 Aug  7 13:09 system-local-KeyCache-b.db
-rw-rw-r-- 1 manuzhang manuzhang  60 Aug  7 13:09 system-schema_columnfamilies-KeyCache-b.db
-rw-rw-r-- 1 manuzhang manuzhang  60 Aug  7 13:09 system-schema_columns-KeyCache-b.db

Also, fail to load system table files.

ERROR 17:03:16,637 Fatal exception during initialization
org.apache.cassandra.config.ConfigurationException: Found system table files, but they    couldn't be loaded!
at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:303)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:201)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:349)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:392)

Now I'm able to reproduce the loading system table failure for every three runs of Cassandra (I clean up all the files afterwards). The exception is thrown here:

/**
 * One of three things will happen if you try to read the system table:
 * 1. files are present and you can read them: great
 * 2. no files are there: great (new node is assumed)
 * 3. files are present but you can't read them: bad
 * @throws ConfigurationException
 */
public static void checkHealth() throws ConfigurationException
{
    Table table;
    try
    {
        table = Table.open(Table.SYSTEM_TABLE);
    }
    catch (AssertionError err)
    {
        // this happens when a user switches from OPP to RP.
        ConfigurationException ex = new ConfigurationException("Could not read system table!");
        ex.initCause(err);
        throw ex;
    }
    ColumnFamilyStore cfs = table.getColumnFamilyStore(LOCAL_CF);

    String req = "SELECT cluster_name FROM system.%s WHERE key='%s'";
    UntypedResultSet result = processInternal(String.format(req, LOCAL_CF, LOCAL_KEY));

    if (result.isEmpty() || !result.one().has("cluster_name"))
    {
        // this is a brand new node
        if (!cfs.getSSTables().isEmpty())
            throw new ConfigurationException("Found system table files, but they couldn't be loaded!");

        // no system files.  this is a new node.
        req = "INSERT INTO system.%s (key, cluster_name) VALUES ('%s', '%s')";
        processInternal(String.format(req, LOCAL_CF, LOCAL_KEY, DatabaseDescriptor.getClusterName()));
        return;
    }

    String savedClusterName = result.one().getString("cluster_name");
    if (!DatabaseDescriptor.getClusterName().equals(savedClusterName))
        throw new ConfigurationException("Saved cluster name " + savedClusterName + " != configured name " + DatabaseDescriptor.getClusterName());
}

The three runs correspond exactly with the three conditions in the comment.

"No files are there" in the first run since it is a brand new node.

In the second run, "files are there and you can read them".

In the third run, "files are there but you can not read them" and I've checked that both result.isEmpty() and result.one.has("cluster_name") return false.

Actually, I'm confused with the exception "couldn't be loaded". What does it mean? I don't think it's a file system permission issue since r/w permissions are granted to the current user.

The above problems go away after I delete all related files but I don't want to do it every time running Cassandra.

This has been afflicting me for quite a while.

An unrelated issue is that I don't think Cassandra@stackoverflow has received enough attention from the community. Do you agree?

Any ideas or suggestions would be appreciated.

Thanks.

like image 304
manuzhang Avatar asked Aug 07 '12 06:08

manuzhang


1 Answers

I had this problem in 2 scenarios.

  1. I tried changing the partitioner without removing the cluster's data (cant do that) Also view mailing list for explanation.
  2. I ran the cassandra process as a superuser the first time it was started sudo ./cassandra which created the necessary data/log/cache directories with permissions only for the superuser, and then restarted cassandra and ran the process as regular user (and thus didn't have permission to use files in the directories created by the process run by the superuser).

I know you solved the problem, but this might be useful for other developers.

like image 72
Lyuben Todorov Avatar answered Jan 04 '23 00:01

Lyuben Todorov