Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use prefixes on member variables in C++ classes

People also ask

What is the purpose of using the prefix for variable names?

The idea of variable prefixes They are used to tell the developer what kind of variable he is using. The first prefix for example is m_ and the conclusion is this variable is a member. When I learned programming we had to give the developer more information in case of a member.

When referencing user string variables What prefix do you use?

From my experience, C++ code tends to use an underscore or m_ as a prefix for member variables.

Why do variable names start with M?

The m is here to indicate a member variable. It has 2 huge advantages: If you see it, you instantly recognize it as a member variable. Press m and you get all members via the auto completer.

What are member variables in a class?

In object-oriented programming, a member variable (sometimes called a member field) is a variable that is associated with a specific object, and accessible for all its methods (member functions).


I'm all in favour of prefixes done well.

I think (System) Hungarian notation is responsible for most of the "bad rap" that prefixes get.

This notation is largely pointless in strongly typed languages e.g. in C++ "lpsz" to tell you that your string is a long pointer to a nul terminated string, when: segmented architecture is ancient history, C++ strings are by common convention pointers to nul-terminated char arrays, and it's not really all that difficult to know that "customerName" is a string!

However, I do use prefixes to specify the usage of a variable (essentially "Apps Hungarian", although I prefer to avoid the term Hungarian due to it having a bad and unfair association with System Hungarian), and this is a very handy timesaving and bug-reducing approach.

I use:

  • m for members
  • c for constants/readonlys
  • p for pointer (and pp for pointer to pointer)
  • v for volatile
  • s for static
  • i for indexes and iterators
  • e for events

Where I wish to make the type clear, I use standard suffixes (e.g. List, ComboBox, etc).

This makes the programmer aware of the usage of the variable whenever they see/use it. Arguably the most important case is "p" for pointer (because the usage changes from var. to var-> and you have to be much more careful with pointers - NULLs, pointer arithmetic, etc), but all the others are very handy.

For example, you can use the same variable name in multiple ways in a single function: (here a C++ example, but it applies equally to many languages)

MyClass::MyClass(int numItems)
{
    mNumItems = numItems;
    for (int iItem = 0; iItem < mNumItems; iItem++)
    {
        Item *pItem = new Item();
        itemList[iItem] = pItem;
    }
}

You can see here:

  • No confusion between member and parameter
  • No confusion between index/iterator and items
  • Use of a set of clearly related variables (item list, pointer, and index) that avoid the many pitfalls of generic (vague) names like "count", "index".
  • Prefixes reduce typing (shorter, and work better with auto-completion) than alternatives like "itemIndex" and "itemPtr"

Another great point of "iName" iterators is that I never index an array with the wrong index, and if I copy a loop inside another loop I don't have to refactor one of the loop index variables.

Compare this unrealistically simple example:

for (int i = 0; i < 100; i++)
    for (int j = 0; j < 5; j++)
        list[i].score += other[j].score;

(which is hard to read and often leads to use of "i" where "j" was intended)

with:

for (int iCompany = 0; iCompany < numCompanies; iCompany++)
    for (int iUser = 0; iUser < numUsers; iUser++)
       companyList[iCompany].score += userList[iUser].score;

(which is much more readable, and removes all confusion over indexing. With auto-complete in modern IDEs, this is also quick and easy to type)

The next benefit is that code snippets don't require any context to be understood. I can copy two lines of code into an email or a document, and anyone reading that snippet can tell the difference between all the members, constants, pointers, indexes, etc. I don't have to add "oh, and be careful because 'data' is a pointer to a pointer", because it's called 'ppData'.

And for the same reason, I don't have to move my eyes out of a line of code in order to understand it. I don't have to search through the code to find if 'data' is a local, parameter, member, or constant. I don't have to move my hand to the mouse so I can hover the pointer over 'data' and then wait for a tooltip (that sometimes never appears) to pop up. So programmers can read and understand the code significantly faster, because they don't waste time searching up and down or waiting.

(If you don't think you waste time searching up and down to work stuff out, find some code you wrote a year ago and haven't looked at since. Open the file and jump about half way down without reading it. See how far you can read from this point before you don't know if something is a member, parameter or local. Now jump to another random location... This is what we all do all day long when we are single stepping through someone else's code or trying to understand how to call their function)

The 'm' prefix also avoids the (IMHO) ugly and wordy "this->" notation, and the inconsistency that it guarantees (even if you are careful you'll usually end up with a mixture of 'this->data' and 'data' in the same class, because nothing enforces a consistent spelling of the name).

'this' notation is intended to resolve ambiguity - but why would anyone deliberately write code that can be ambiguous? Ambiguity will lead to a bug sooner or later. And in some languages 'this' can't be used for static members, so you have to introduce 'special cases' in your coding style. I prefer to have a single simple coding rule that applies everywhere - explicit, unambiguous and consistent.

The last major benefit is with Intellisense and auto-completion. Try using Intellisense on a Windows Form to find an event - you have to scroll through hundreds of mysterious base class methods that you will never need to call to find the events. But if every event had an "e" prefix, they would automatically be listed in a group under "e". Thus, prefixing works to group the members, consts, events, etc in the intellisense list, making it much quicker and easier to find the names you want. (Usually, a method might have around 20-50 values (locals, params, members, consts, events) that are accessible in its scope. But after typing the prefix (I want to use an index now, so I type 'i...'), I am presented with only 2-5 auto-complete options. The 'extra typing' people attribute to prefixes and meaningful names drastically reduces the search space and measurably accelerates development speed)

I'm a lazy programmer, and the above convention saves me a lot of work. I can code faster and I make far fewer mistakes because I know how every variable should be used.


Arguments against

So, what are the cons? Typical arguments against prefixes are:

  • "Prefix schemes are bad/evil". I agree that "m_lpsz" and its ilk are poorly thought out and wholly useless. That's why I'd advise using a well designed notation designed to support your requirements, rather than copying something that is inappropriate for your context. (Use the right tool for the job).

  • "If I change the usage of something I have to rename it". Yes, of course you do, that's what refactoring is all about, and why IDEs have refactoring tools to do this job quickly and painlessly. Even without prefixes, changing the usage of a variable almost certainly means its name ought to be changed.

  • "Prefixes just confuse me". As does every tool until you learn how to use it. Once your brain has become used to the naming patterns, it will filter the information out automatically and you won't really mind that the prefixes are there any more. But you have to use a scheme like this solidly for a week or two before you'll really become "fluent". And that's when a lot of people look at old code and start to wonder how they ever managed without a good prefix scheme.

  • "I can just look at the code to work this stuff out". Yes, but you don't need to waste time looking elsewhere in the code or remembering every little detail of it when the answer is right on the spot your eye is already focussed on.

  • (Some of) that information can be found by just waiting for a tooltip to pop up on my variable. Yes. Where supported, for some types of prefix, when your code compiles cleanly, after a wait, you can read through a description and find the information the prefix would have conveyed instantly. I feel that the prefix is a simpler, more reliable and more efficient approach.

  • "It's more typing". Really? One whole character more? Or is it - with IDE auto-completion tools, it will often reduce typing, because each prefix character narrows the search space significantly. Press "e" and the three events in your class pop up in intellisense. Press "c" and the five constants are listed.

  • "I can use this-> instead of m". Well, yes, you can. But that's just a much uglier and more verbose prefix! Only it carries a far greater risk (especially in teams) because to the compiler it is optional, and therefore its usage is frequently inconsistent. m on the other hand is brief, clear, explicit and not optional, so it's much harder to make mistakes using it.


I generally don't use a prefix for member variables.

I used to use a m prefix, until someone pointed out that "C++ already has a standard prefix for member access: this->.

So that's what I use now. That is, when there is ambiguity, I add the this-> prefix, but usually, no ambiguity exists, and I can just refer directly to the variable name.

To me, that's the best of both worlds. I have a prefix I can use when I need it, and I'm free to leave it out whenever possible.

Of course, the obvious counter to this is "yes, but then you can't see at a glance whether a variable is a class member or not".

To which I say "so what? If you need to know that, your class probably has too much state. Or the function is too big and complicated".

In practice, I've found that this works extremely well. As an added bonus it allows me to promote a local variable to a class member (or the other way around) easily, without having to rename it.

And best of all, it is consistent! I don't have to do anything special or remember any conventions to maintain consistency.


By the way, you shouldn't use leading underscores for your class members. You get uncomfortably close to names that are reserved by the implementation.

The standard reserves all names starting with double underscore or underscore followed by capital letter. It also reserves all names starting with a single underscore in the global namespace.

So a class member with a leading underscore followed by a lower-case letter is legal, but sooner or late you're going to do the same to an identifier starting with upper-case, or otherwise break one of the above rules.

So it's easier to just avoid leading underscores. Use a postfix underscore, or a m_ or just m prefix if you want to encode scope in the variable name.


You have to be careful with using a leading underscore. A leading underscore before a capital letter in a word is reserved. For example:

_Foo

_L

are all reserved words while

_foo

_l

are not. There are other situations where leading underscores before lowercase letters are not allowed. In my specific case, I found the _L happened to be reserved by Visual C++ 2005 and the clash created some unexpected results.

I am on the fence about how useful it is to mark up local variables.

Here is a link about which identifiers are reserved: What are the rules about using an underscore in a C++ identifier?


I prefer postfix underscores, like such:

class Foo
{
   private:
      int bar_;

   public:
      int bar() { return bar_; }
};

Lately I have been tending to prefer m_ prefix instead of having no prefix at all, the reasons isn't so much that its important to flag member variables, but that it avoids ambiguity, say you have code like:

void set_foo(int foo) { foo = foo; }

That of cause doesn't work, only one foo allowed. So your options are:

  • this->foo = foo;

    I don't like it, as it causes parameter shadowing, you no longer can use g++ -Wshadow warnings, its also longer to type then m_. You also still run into naming conflicts between variables and functions when you have a int foo; and a int foo();.

  • foo = foo_; or foo = arg_foo;

    Been using that for a while, but it makes the argument lists ugly, documentation shouldn't have do deal with name disambiguity in the implementation. Naming conflicts between variables and functions also exist here.

  • m_foo = foo;

    API Documentation stays clean, you don't get ambiguity between member functions and variables and its shorter to type then this->. Only disadvantage is that it makes POD structures ugly, but as POD structures don't suffer from the name ambiguity in the first place, one doesn't need to use it with them. Having a unique prefix also makes a few search&replace operations easier.

  • foo_ = foo;

    Most of the advantages of m_ apply, but I reject it for aesthetic reasons, a trailing or leading underscore just makes the variable look incomplete and unbalanced. m_ just looks better. Using m_ is also more extendable, as you can use g_ for globals and s_ for statics.

PS: The reason why you don't see m_ in Python or Ruby is because both languages enforce the their own prefix, Ruby uses @ for member variables and Python requires self..


When reading through a member function, knowing who "owns" each variable is absolutely essential to understanding the meaning of the variable. In a function like this:

void Foo::bar( int apples )
{
    int bananas = apples + grapes;
    melons = grapes * bananas;
    spuds += melons;
}

...it's easy enough to see where apples and bananas are coming from, but what about grapes, melons, and spuds? Should we look in the global namespace? In the class declaration? Is the variable a member of this object or a member of this object's class? Without knowing the answer to these questions, you can't understand the code. And in a longer function, even the declarations of local variables like apples and bananas can get lost in the shuffle.

Prepending a consistent label for globals, member variables, and static member variables (perhaps g_, m_, and s_ respectively) instantly clarifies the situation.

void Foo::bar( int apples )
{
    int bananas = apples + g_grapes;
    m_melons = g_grapes * bananas;
    s_spuds += m_melons;
}

These may take some getting used to at first—but then, what in programming doesn't? There was a day when even { and } looked weird to you. And once you get used to them, they help you understand the code much more quickly.

(Using "this->" in place of m_ makes sense, but is even more long-winded and visually disruptive. I don't see it as a good alternative for marking up all uses of member variables.)

A possible objection to the above argument would be to extend the argument to types. It might also be true that knowing the type of a variable "is absolutely essential to understanding the meaning of the variable." If that is so, why not add a prefix to each variable name that identifies its type? With that logic, you end up with Hungarian notation. But many people find Hungarian notation laborious, ugly, and unhelpful.

void Foo::bar( int iApples )
{
    int iBananas = iApples + g_fGrapes;
    m_fMelons = g_fGrapes * iBananas;
    s_dSpuds += m_fMelons;
}

Hungarian does tell us something new about the code. We now understand that there are several implicit casts in the Foo::bar() function. The problem with the code now is that the value of the information added by Hungarian prefixes is small relative to the visual cost. The C++ type system includes many features to help types either work well together or to raise a compiler warning or error. The compiler helps us deal with types—we don't need notation to do so. We can infer easily enough that the variables in Foo::bar() are probably numeric, and if that's all we know, that's good enough for gaining a general understanding of the function. Therefore the value of knowing the precise type of each variable is relatively low. Yet the ugliness of a variable like "s_dSpuds" (or even just "dSpuds") is great. So, a cost-benefit analysis rejects Hungarian notation, whereas the benefit of g_, s_, and m_ overwhelms the cost in the eyes of many programmers.