Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create Fuseki in memory server

There is a parameter --mem in Fuseki:

fuseki-server --mem  /DatasetPathName

Could I use it to load full tdb indices into memory to improve query performance?

For example,

fuseki-server --mem  --loc=/tdbpath  /DatasetPathName

/tdbpath is a directory with tdb index and files (I load my data using tdbloader).

I tried it a little but found that the adding mem doesn't increase the usage of memory (comparing with fuseki-server --loc=/tdbpath /DatasetPathName). Did I do something wrong?

Thanks

like image 220
GStone Avatar asked Apr 26 '26 01:04

GStone


1 Answers

Looking at the implementation of FusekiCmd#processModulesAndArgs(), Fuseki interprets the arguments --mem, --memtdb and --loc=X as mutually exclusive specifications of the existence of a single dataset. Providing both --mem and --loc=X in the same set of command-line arguments results in only --loc=X being used:

        if ( contains(argMem) ) {
            log.info("Dataset: in-memory") ;
            cmdLineDataset = new ServerInitialConfig() ;
            cmdLineDataset.argTemplateFile = Template.templateMemFN ; 
        }

        if ( contains(argFile) ) {
            String filename = getValue(argFile) ;
            log.info("Dataset: in-memory: load file: " + filename) ;
            if ( !FileOps.exists(filename) )
                throw new CmdException("File not found: " + filename) ;

            // Directly populate the dataset.
            cmdLineDataset = new ServerInitialConfig() ;
            cmdLineDataset.dsg = DatasetGraphFactory.createMem() ;

            // INITIAL DATA.
            Lang language = RDFLanguages.filenameToLang(filename) ;
            if ( language == null )
                throw new CmdException("Can't guess language for file: " + filename) ;
            RDFDataMgr.read(cmdLineDataset.dsg, filename) ;
        }

        if ( contains(argMemTDB) ) {
            //log.info("TDB dataset: in-memory") ;
            cmdLineDataset = new ServerInitialConfig() ;
            cmdLineDataset.argTemplateFile = Template.templateTDBMemFN ;
            cmdLineDataset.params.put(Template.DIR, Names.memName) ;
        }

        if ( contains(argTDB) ) {
            cmdLineDataset = new ServerInitialConfig() ;
            cmdLineDataset.argTemplateFile = Template.templateTDBDirFN ;
            String dir = getValue(argTDB) ;
            cmdLineDataset.params.put(Template.DIR, dir) ;
        }

As seen above, if one of these options is selected, later options can overwrite the dataset supplied through command-line arguments. At most, one can be used. That being said, you can tell Fuseski to use an in-memory TDB dataset using the --memtdb option. As per the documentation, this should only be used for testing.

As per @andys, TDB (using the --loc option) should cache values into memory as they are used. If you need persistence and don't want to introduce additional lifecycle stages to your application, TDB is the best way to go. If your dataset can fit entirely in memory, you don't need persistence, or you can afford to introduce a separate save-and-shutdown step to your application, in-memory can be much much faster.

like image 67
Rob Hall Avatar answered Apr 29 '26 12:04

Rob Hall



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!