Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are sets in many programming languages not really sets?

In several programming languages there are Set collections which are supposed to be implementations of the mathematical concept of a finite set.

However, this is not necessarily true, for example in C# and Java, both implementations of HashSet<T> allow you to add any HashSet<T> collection as a member of itself. Which by the modern definition of a mathematical set is not allowed.

Background:

According to naive set theory, the definition of a set is:

A set is a collection of distinct objects.

However, this definition lead famously to Russel's Paradox as well as other paradoxes. For convenience, Russel's Paradox is:

Let R be the set of all sets that are not members of themselves. If R is not a member of itself, then its definition dictates that it must contain itself, and if it contains itself, then it contradicts its own definition as the set of all sets that are not members of themselves.

So according to modern set theory (See: ZFC), the definition of a set is:

A set is a collection of distinct objects, none of which is the set itself.

Specifically, this is a result of the axiom of regularity.

Well so what? What are the implications of this? Why is this question on StackOverflow?

One of the implications of Russel's Paradox is that not all collections are sets. Additionally, this was the point where mathematicians dropped the definition of a set as being the usual English definition. So I believe this question has a lot of weight when it comes to programming language design in general.

Question(s):

So why would programming languages, who in some form, use these principles in their very design just ignore it in the implementation of the Set in their languages libraries?

Secondly, is this a common occurrence with other implementations of mathematical concepts?

Perhaps I'm being a bit nit picky, but if these are to be true implementations of Sets, then why would part of the very definition be ignored?

Update

Addition of C# and Java code snippets exemplifying behavior:

Java Snippet:

Set<Object> hashSet = new HashSet<Object>();
hashSet.add(1);
hashSet.add("Tiger");
hashSet.add(hashSet);
hashSet.add('f');

Object[] array = hashSet.toArray();
HashSet<Object> hash = (HashSet<Object>)array[3];

System.out.println("HashSet in HashSet:");       
for (Object obj : hash)
    System.out.println(obj);

System.out.println("\nPrinciple HashSet:");
for (Object obj : hashSet)
    System.out.println(obj);

Which prints out:

HashSet in HashSet:
f
1
Tiger
[f, 1, Tiger, (this Collection)]

Principle HashSet:
f
1
Tiger
[f, 1, Tiger, (this Collection)]

C# Snippet:

HashSet<object> hashSet = new HashSet<object>();
hashSet.Add(1);
hashSet.Add("Tiger");
hashSet.Add(hashSet);
hashSet.Add('f');

object[] array = hashSet.ToArray();
var hash = (HashSet<object>)array[2];

Console.WriteLine("HashSet in HashSet:");
foreach (object obj in hash)
     Console.WriteLine(obj);

Console.WriteLine("\nPrinciple HashSet:");
foreach (object obj in hashSet)
     Console.WriteLine(obj);

Which prints out:

HashSet in HashSet:
1
Tiger
System.Collections.Generic.HashSet`1[System.Object]
f

Principle HashSet:
1
Tiger
System.Collections.Generic.HashSet`1[System.Object]
f

Update 2

In regards to Martijn Courteaux's second point which was that it could be done in the name of computational efficiency:

I made two test collections in C#. They were identical, except in the Add method of one of the them - I added the following check: if (this != obj) where obj is the item being added to the collection.

I clocked both of them separately where they were to add 100,000 random integers:

With Check: ~ 28 milliseconds

Without Check: ~ 21 milliseconds

This is a fairly significant performance jump.

like image 998
Derek W Avatar asked Aug 31 '13 20:08

Derek W


People also ask

What is set in programming language?

In computer science, a set is an abstract data type that can store unique values, without any particular order. It is a computer implementation of the mathematical concept of a finite set.

Are sets used in coding?

In computer science, a set is an abstract data type that will store an unordered collection of unique values. In many programming languages, sets are implemented as built-in data structures, similar to arrays or dictionaries.

Why are sets useful in programming?

A set is a data structure that can store any number of unique values in any order you so wish. Set's are different from arrays in the sense that they only allow non-repeated, unique values within them.

Why do we still need multiple types of programming languages instead of combining them?

Conclusion. To sum it up, the main reason why there are many programming languages out there is that different problems require different tools to solve them. Each programming language has certain features and characteristics that make it suitable for specific tasks.


2 Answers

Programming language sets really aren't like ZFC sets, but for quite different reasons than you suppose:

  1. You can't form a set by comprehension (i.e. set of all objects such that ...). Note that this already blocks all (I believe) naive set theory paradoxes, so they are irrelevant.

  2. They usually can't be infinite.

  3. There exist objects which aren't sets (in ZFC there are only sets).

  4. They are usually mutable (i.e. you can add/remove elements to/from the set).

  5. The objects they contain can be mutable.

So the answer to

So why would programming languages, who in some form, use these principles in their very design just ignore it in the implementation of the Set in their languages libraries?

is that the languages don't use these principles.

like image 192
Alexey Romanov Avatar answered Oct 06 '22 01:10

Alexey Romanov


I can't speak for C#, but as far as Java is concerned a set is a set. If you look at the javadoc for the Set interface, you will see (emphasis mine):

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.

Whether the prohibition is actively enforced is unclear (adding a HashSet to itself does not throw any exceptions for example), but at least it is clearly documented that you should not try because the behaviour would then be unspecified.

like image 29
assylias Avatar answered Oct 06 '22 01:10

assylias