Looking for a C++ implementation of the C4.5 algorithm

Tags:

I've been looking for a C++ implementation of the C4.5 algorithm, but I haven't been able to find one yet. I found Quinlan's C4.5 Release 8, but it's written in C... has anybody seen any open source C++ implementations of the C4.5 algorithm?

I'm thinking about porting the J48 source code (or simply writing a wrapper around the C version) if I can't find an open source C++ implementation out there, but I hope I don't have to do that! Please let me know if you have come across a C++ implementation of the algorithm.

Update

I've been considering the option of writing a thin C++ wrapper around the C implementation of the C5.0 algorithm (C5.0 is the improved version of C4.5). I downloaded and compiled the C implementation of the C5.0 algorithm, but it doesn't look like it's easily portable to C++. The C implementation uses a lot of global variables and simply writing a thin C++ wrapper around the C functions will not result in an object oriented design because each class instance will be modifying the same global parameters. In other words: I will have no encapsulation and that's a pretty basic thing that I need.

In order to get encapsulation I will need to make a full blown port of the C code into C++, which is about the same as porting the Java version (J48) into C++.

Update 2.0

Here are some specific requirements:

Each classifier instance must encapsulate its own data (i.e. no global variables aside from constant ones).
Support the concurrent training of classifiers and the concurrent evaluation of the classifiers.

Here is a good scenario: suppose I'm doing 10-fold cross-validation, I would like to concurrently train 10 decision trees with their respective slice of the training set. If I just run the C program for each slice, I would have to run 10 processes, which is not horrible. However, if I need to classify thousands of data samples in real time, then I would have to start a new process for each sample I want to classify and that's not very efficient.

553

asked May 25 '12 15:05

Kiril

2 Answers

A C++ implementation for C4.5 called YaDT is available here, in the "Decision Trees" section:
http://www.di.unipi.it/~ruggieri/software.html

This is the source code for the last version:
http://www.di.unipi.it/~ruggieri/YaDT/YaDT1.2.5.zip

From the paper where the tool is described:

[...] In this paper, we describe a new from-scratch C++ implementation of a decision tree induction algorithm, which yields entropy-based decision trees in the style of C4.5. The implementation is called YaDT, an acronym for Yet another Decision Tree builder. The intended contribution of this paper is to present the design principles of the implementation that allowed for obtaining a highly efficient system. We discuss our choices on memory representation and modelling of data and metadata,on the algorithmic optimizations and their effect on memory and time performances, and on the trade-off between efficiency and accuracy of pruning heuristics. [...]

The paper is available here.

126

answered Sep 22 '22 21:09

gRizzlyGR

I may have found a possible C++ "implementation" of C5.0 (See5.0), but I haven't been able to dig into the source code enough to determine if it really works as advertised.

To reiterate my original concerns, the author of the port states the following about the C5.0 algorithm:

Another drawback with See5Sam [C5.0] is the impossibility to have more than one application tree at the same time. An application is read from files each time the executable is run and is stored in global variables here and there.

I will update my answer as soon as I get some time to look into the source code.

Update

It's looking pretty good, here is the C++ interface:

class CMee5
{
  public:

    /**
      Create a See 5 engine from tree/rules files.
      \param pcFileStem The stem of the See 5 file system. The engine
             initialisation will look for the following files:
              - pcFileStem.names Vanilla See 5 names file (mandatory)
              - pcFileStem.tree or pcFileStem.rules Vanilla See 5 tree or rules
                file (mandatory)
              - pcFileStem.costs Vanilla See 5 costs file (mandatory)
    */
    inline CMee5(const char* pcFileStem, bool bUseRules);

    /**
      Release allocated memory for this engine.
    */
    inline ~CMee5();

    /**
      General classification routine accepting a data record.
    */
    inline unsigned int classifyDataRec(DataRec Case, float* pOutConfidence);

    /**
      Show rules that were used to classify the last case.
      Classify() will have set RulesUsed[] to
      number of active rules for trial 0,
      first active rule, second active rule, ..., last active rule,
      number of active rules for trial 1,
      first active rule, second active rule, ..., last active rule,
      and so on.
    */
    inline void showRules(int Spaces);

    /**
      Open file with given extension for read/write with the actual file stem.
    */
    inline FILE* GetFile(String Extension, String RW);

    /**
      Read a raw case from file Df.

      For each attribute, read the attribute value from the file.
      If it is a discrete valued attribute, find the associated no.
      of this attribute value (if the value is unknown this is 0).

      Returns the array of attribute values.
    */
    inline DataRec GetDataRec(FILE *Df, Boolean Train);
    inline DataRec GetDataRecFromVec(float* pfVals, Boolean Train);
    inline float TranslateStringField(int Att, const char* Name);

    inline void Error(int ErrNo, String S1, String S2);

    inline int getMaxClass() const;
    inline int getClassAtt() const;
    inline int getLabelAtt() const;
    inline int getCWtAtt() const;
    inline unsigned int getMaxAtt() const;
    inline const char* getClassName(int nClassNo) const;
    inline char* getIgnoredVals();

    inline void FreeLastCase(void* DVec);
}

I would say that this is the best alternative I've found so far.

answered Sep 23 '22 21:09

Kiril

Related questions
                            
                                Custom allocation using boost singleton_pool slower than default
                            
                                yvals.h C4514 warning on Windows SDK 7.1 compiler
                            
                                Initializing in Constructor
                            
                                When is an object heavy enough so as to avoid copying?
                            
                                Same permutations in two arrays using next_permutation() stl in c++
                            
                                Interweave a VT unpack with a meta-sequence
                            
                                Undefined symbols for architecture x86_64 in QT-creator
                            
                                What does the 'lower bound' in circulation problems mean?
                            
                                Should third-party types be exposed in my C++ library's API
                            
                                Detection of rectangular bright area in a Image using OpenCv
                            
                                Where is the definition of `struct ap_conf_vector_t`?
                            
                                Static function in template class
                            
                                How to write ASCII and BINARY data to the same file at the same time
                            
                                Can I set a single thread's priority above 15 for a normal priority process?
                            
                                Are move constructors/assignment operators generated for derived classes
                            
                                Are these C++ terms correct? [closed]
                            
                                Open Source Linux Server Projects [closed]
                            
                                Convert a float to a string
                            
                                When one worker thread fails, how to abort remaining workers?
                            
                                C++11: Move/Copy construction ambiguity?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Looking for a C++ implementation of the C4.5 algorithm

Tags:

c++

algorithm

machine-learning

decision-tree

Update

Update 2.0

Kiril

People also ask

2 Answers

gRizzlyGR

Update

Kiril

Recent Activity

Donate For Us