Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does sys.intern() do and when should it be used?

People also ask

What is the use of the intern ()?

intern() The method intern() creates an exact copy of a String object in the heap memory and stores it in the String constant pool. Note that, if another String with the same contents exists in the String constant pool, then a new object won't be created and the new reference will point to the other String.

What is the use of Intern () method show with small example?

intern() method : In Java, when we perform any operation using intern() method, it returns a canonical representation for the string object. A pool is managed by String class. When the intern() method is executed then it checks whether the String equals to this String Object is in the pool or not.

What is intern in Python?

String Interning is a process of storing only one copy of each distinct string value in memory. This means that, when we create two strings with the same value - instead of allocating memory for both of them, only one string is actually committed to memory. The other one just points to that same memory location.


From the Python 3 documentation:

sys.intern(string)

Enter string in the table of “interned” strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

Interned strings are not immortal; you must keep a reference to the return value of intern() around to benefit from it.

Clarification:

As the documentation suggests, the sys.intern function is intended to be used for performance optimization.

The sys.intern function maintains a table of interned strings. When you attempt to intern a string, the function looks it up in the table and:

  1. If the string does not exists (hasn't been interned yet) the function saves it in the table and returns it from the interned strings table.

    >>> import sys
    >>> a = sys.intern('why do pangolins dream of quiche')
    >>> a
    'why do pangolins dream of quiche'
    

    In the above example, a holds the interned string. Even though it is not visible, the sys.intern function has saved the 'why do pangolins dream of quiche' string object in the interned strings table.

  2. If the string exists (has been interned) the function returns it from the interned strings table.

    >>> b = sys.intern('why do pangolins dream of quiche')
    >>> b
    'why do pangolins dream of quiche'
    

    Even though it is not immediately visible, because the string 'why do pangolins dream of quiche' has been interned before, b holds now the same string object as a.

    >>> b is a
    True
    

    If we create the same string without using intern, we end up with two different string objects that have the same value.

    >>> c = 'why do pangolins dream of quiche'
    >>> c is a
    False
    >>> c is b
    False
    

By using sys.intern you ensure that you never create two string objects that have the same value—when you request the creation of a second string object with the same value as an existing string object, you receive a reference to the pre-existing string object. This way, you are saving memory. Also, string objects comparison is now very efficient because it is carried out by comparing the memory addresses of the two string objects instead of their content.


Essentially intern looks up (or stores if not present) the string in a collection of interned strings, so all interned instances will share the same identity. You trade the one-time cost of looking up this string for faster comparisons (the compare can return True after just checking for identity, rather than having to compare each character), and reduced memory usage.

However, python will automatically intern strings that are small, or look like identifiers, so you may find you get no improvement because your strings are already being interned behind the scenes. For example:

>>> a = 'abc'; b = 'abc'
>>> a is b
True

In the past, one disadvantage was that interned strings were permanent. Once interned, the string memory was never freed even after all references were dropped. I think this is no longer the case for more recent versions of python though.


They weren't talking about keyword intern because there is no such thing in Python. They were talking about non-essential built-in function intern. Which in py3k has been moved to sys.intern. Docs have an exhaustive description.


It returns a canonical instance of the string.

Therefore if you have many string instances that are equal you save memory, and in addition you can also compare canonicalized strings by identity instead of equality which is faster.