Rationale for Ada 2005

John Barnes
Contents   Index   References   Search   Previous   Next 

8.4 Sets

Sets, like maps, come in two forms: hashed and ordered. Sets are of course just collections of values and there is no question of a key (we can perhaps think of the value as being its own key). Thus in the case of an ordered set the values are stored in order whereas in the case of a map, it is the keys that are stored in order. As well as the usual operations of inserting elements into a set and searching and so on, there are also many operations on sets as a whole that do not apply to the other containers – these are the familiar set operations such as union and intersection.
Here is the specification of the ordered sets package giving just those facilities that are common to both kinds of sets.
   type Element_Type is private;
   with function "<" (Left, Right: Element_Type) return Boolean is <>;
   with function "=" (Left, Right: Element_Type) return Boolean is <>;
package Ada.Containers.Ordered_Sets is
   pragma Preelaborate(Ordered_Sets);
   function Equivalent_Elements(Left, Right: Element_Type) return Boolean;
   type Set is tagged private;
   pragma Preelaborable_Initialization(Set);
   type Cursor is private;
   pragma Preelaborable_Initialization(Cursor);
   Empty_Set: constant Set;
   No_Element: constant Cursor;
The only differences from the maps package (apart from the identifiers) are that there is no key type and both "<" and "=" apply to the element type (whereas in the case of maps, the operation "<" applies to the key type). Thus the ordering relationship "<" defined on elements defines equivalence between the elements whereas "=" defines equality.
It is possible for two elements to be equivalent but not equal. For example if they were strings then we might decide that the ordering (and thus equivalence) ignored the case of letters but that equality should take the case into account. (They could also be equal but not equivalent but that is perhaps less likely.)
And as in the case of the maps package, the equality operation on elements is only used by the function "=" for comparing two sets.
Again we have the usual rules as explained for maps. Thus if x < y is true then y < x must be false; x < y and y < z must imply x < z; and x = y and y = x must be the same.
For the convenience of the user the function Equivalent_Elements is declared explicitly. It is equivalent to 
function Equivalent_Elements(Left, Right: Element_Type) return Boolean is
   return not (Left < Right) and not (Right < Left);
end Equivalent_Elements;
This function Equivalent_Elements corresponds to the formal generic parameter of the same name in the hashed sets package discussed below. This should make it easier to convert between the two forms of packages.
function "=" (Left, Right: Set) return Boolean;
function Equivalent_Sets(Left, Right: Set) return Boolean;
function To_Set(New_Item: Element_Type) return Set;
function Length(Container: Set) return Count_Type;
function Is_Empty(Container: Set) return Boolean;
procedure Clear(Container: in out Set);
Note the addition of Equivalent_Sets and To_Set. Two sets are equivalent if they have the same number of elements and the pairs of elements are equivalent. This contrasts with the function "=" where the pairs of elements have to be equal rather than equivalent. Remember that elements might be equivalent but not equal (as in the example of a string mentioned above). The function To_Set takes a single element and creates a set. It is particularly convenient when used in conjunction with operations such as Union described below. The other subprograms are as in the other containers.
function Element(Position: Cursor) return Element_Type;
procedure Replace_Element(
      Container: in out Set;
      Position: in Cursor;
      New_Item: in Element_Type);
procedure Query_Element(
      Position: in Cursor;
      Process: not null access procedure (Element: in Element_Type));
Again these are much as expected except that there is no procedure Update_Element. This is because the elements are arranged in terms of their own value (either by order or through the hash function) and if we just change an element in situ then it might become out of place (this problem does not arise with the other containers). This also means that Replace_Element has to ensure that the value New_Item is not equivalent to an element in a different position; if it is then Program_Error is raised. We will return to the problem of the missing Update_Element later.
procedure Move(Target, Source: in out Set);
This is just as for the other containers.
procedure Insert(
      Container: in out Set;
      New_Item: in Element_Type;
      Position: out Cursor;
      Inserted: out Boolean);
procedure Insert(
      Container: in out Set;
      New_Item: in Element_Type);
These insert a new element into the set unless an equivalent element already exists. If it does exist then the first one returns with Inserted set to False and Position indicating the element whereas the second raises Constraint_Error (the element value is not changed). If an equivalent element is not in the set then it is added and Position is set accordingly.
procedure Include(Container: in out Set; New_Item: in Element_Type);
This is somewhat like the last Insert except that if an equivalent element is already in the set then it is replaced (rather than raising Constraint_Error).
procedure Replace(Container: in out Set; New_Item: in Element_Type);
In this case, Constraint_Error is raised if an equivalent element does not already exist.
procedure Exclude(Container: in out Set; Item: in Element_Type);
If an element equivalent to Item is already in the set, then it is deleted.
procedure Delete(Container: in out Set; Item: in Element_Type);
procedure Delete(Container: in out Set; Position: in out Cursor);
These delete an element. In the first case if there is no such equivalent element then Constraint_Error is raised. In the second case if the cursor is No_Element then again Constraint_Error is also raised – there is also a check to ensure that the cursor otherwise does designate an element in the correct set (remember that cursors designate both an entity and the container); if this check fails then Program_Error is raised.
And now some new stuff, the usual set operations.
procedure Union(Target: in out Set; Source: in Set);
function Union(Left, Right: Set) return Set;
function "or" (Left, Right: Set) return Set renames Union;
procedure Intersection(Target: in out Set; Source: in Set);
function Intersection(Left, Right: Set) return Set;
function "and" (Left, Right: Set) return Set renames Intersection;
procedure Difference(Target: in out Set; Source: in Set);
function Difference(Left, Right: Set) return Set;
function "–" (Left, Right: Set) return Set renames Difference;
procedure Symmetric_Difference(Target: in out Set; Source: in Set);
function Symmetric_Difference (Left, Right: Set) return Set;
function "xor" (Left, Right: Set) return Set renames Symmetric_Difference;
These all do exactly what one would expect using the equivalence relation on the elements.
function Overlap(Left, Right: Set) return Boolean;
function Is_Subset(Subset: Set; Of_Set: Set) return Boolean;
These are self-evident as well.
function First(Container: Set) return Cursor;
function Last(Container: Set) return Cursor;
function Next(Position: Cursor) return Cursor;
procedure Next(Position: in out Cursor);
function Find(Container: Set; Item: Element_Type) return Cursor;
function Contains(Container: Set; Item: Element_Type) return Boolean;
These should be self-evident and are very similar to the corresponding operations on maps. Again unlike the operations on vectors and lists, Find logically searches the whole set and not just starting at some point (there is also no Reverse_Find). Moreover, Find uses the equivalence relation based on the "<" parameter.
function Has_Element(Position: Cursor) return Boolean;
procedure Iterate(
      Container: in Set;
      Process: not null access procedure (Position: in Cursor));
These are also as for other containers.
The sets packages conclude with an internal generic package called Generic_Keys. This package enables some set operations to be performed in terms of keys where the key is a function of the element. Note carefully that in the case of a map, the element is defined in terms of the key whereas here the situation is reversed. An equivalence relationship is defined for these keys as well; this is defined by a generic parameter "<" for ordered sets and Equivalent_Keys for hashed sets.
In the case of ordered sets the formal parameters are 
   type Key_Type(<>) is private;
   with function Key(Element: Element_Type) return Key_Type;
   with function "<" (Left, Right: Key_Type) return Boolean is <>;
package Generic_Keys is
The following are then common to the package Generic_Keys for both hashed and ordered sets.
function Key(Position: Cursor) return Key_Type;
function Element(Container: Set; Key: Key_Type) return Element_Type;
procedure Replace(
      Container: in out Set;
      Key: in Key_Type; New_Item: in Element_Type);
procedure Exclude(
      Container: in out Set; Key: in Key_Type);
procedure Delete(Container: in out Set; Key: in Key_Type);
function Find(Container: Set; Key: Key_Type) return Cursor;
function Contains(Container: Set; Key: Key_Type) return Boolean;
procedure Update_Element_Preserving_Key(
      Container: in out Set; Position: in Cursor;
      Process: not null access procedure (Element: in out Element_Type));
and then finally
end Generic_Keys;
   ... -- not specified by the language
end Ada.Containers.Ordered_Sets;
It is expected that most user of sets will use them in a straightforward manner and that the operations specific to sets such as Union and Intersection will be dominant.
However, sets can be used as sort of economy class maps by using the inner package Generic_Keys. Although this is certainly not for the novice we will illustrate how this might be done by reconsidering the stock problem using sets rather than maps. We declare 
type Part_Type is
      Part_Number: Integer;
      Year: Integer;
      Shelf: Character range 'A' .. 'T';
      Stock: Integer;
   end record;
Here we have put all the information in the one type.
We then declare "<" much as before 
function "<" (Left, Right: Part_Type) return Boolean is
   return Left.Part_Number < Right.Part_Number;
end "<";
and then instantiate the package thus 
package Store_Sets is new Ordered_Sets(Element_Type => Part_Type);
The_Store: Store_Sets.Set;
We have used the default generic parameter mechanism for "<" this time by way of illustration.
In this case we add items to the store by calling 
The_Store.Insert((34618, 1998, 'F', 25));
The_Store.Insert((27134, 2004, 'C', 45));
The procedure for checking the stock could now become 
procedure Request(
       Part: in Integer: OK: out Boolean;
       Year: out Integer; Shelf: out Character) is
   C: Cursor;
   E: Part_Type;
   C := The_Store.Find((Part, others => <>));
   if C = No_Element then
      OK := False; return;    -- no such item
   end if;
   E := Element(C);
   Year := E.Year;
   Shelf := E.Shelf;
   if E.Stock = 0 then
      OK := False; return;    -- out of stock
   end if;
   Replace_Element(C, (E.Part_Number, Year; Shelf, E.Stock–1));
   OK := True;
end Request;
This works but is somewhat unsatisfactory. For one thing we have had to make up dummy components in the call of Find (using <>) and moreover we have had to replace the whole of the element although we only wanted to update the Stock component. Moreover, we cannot use Update_Element because it is not defined for sets at all. Remember that this is because it might make things out of order; that wouldn't be a problem in this case because we don't want to change the part number and our ordering is just by the part number.
A better approach is to use the part number as a key. We define 
type Part_Key is new Integer;
function Part_No(P: Part_Type) return Part_Key is
   return Part_Key(P.Part_Number);
end Part_No;
and then
package Party is new Generic_Keys(Key_Type => Part_Key, Key => Part_No);
use Party;
Note that we do not have to define "<" on the type Part_Key at all because it already exists since Part_Key is an integer type. And the instantiation uses it by default.
And now we can rewrite the Request procedure as follows
procedure Request(
      Part: in Part_Key; OK: out Boolean;
      Year: out Integer; Shelf: out Character) is
   C: Cursor;
   E: Part_Type;
   C := Find(The_Store, Part);
   if C = No_Element then
      OK := False; return;    -- no such item
   end if;
   E := Element(C);
   Year := E.Year;  Shelf := E.Shelf;
   if E.Stock = 0 then
      OK := False; return;    -- out of stock
   end if;
   -- we are now going to update the stock level
      procedure Do_It(E: in out Part_Type) is
         E.Stock := E.Stock – 1;
      end Do_It;
      Update_Element_Preserving_Key(The_Store, C, Do_It'Access);
   OK := True;
end Request;
This seems hard work but has a number of advantages. The first is that the call of Find is more natural and only involves the part number (the key) – note that this is a call of the function Find in the instantiation of Generic_Keys and takes just the part number. And the other is that the update only involves the component being changed.
We mentioned earlier that there was no Update_Element for sets because of the danger of creating a value that was in the wrong place. In the case of the richly named Update_Element_Preserving_Key it also checks to ensure that the element is indeed still in the correct place (by checking that the key is still the same); if it isn't it removes the element and raises Program_Error.
But the user is warned to take care when using the package Generic_Keys. It is absolutely vital that the relational operation and the function (Part_No) used to instantiate Generic_Keys are compatible with the ordering used to instantiate the parent package Containers.Ordered_Sets itself. If this is not the case then the sky might fall in.
Incidentally, the procedure for checking the stock which previously used the maps package now becomes
procedure Check_Stock(Low: in Integer) is
   procedure Check_It(C: in Cursor) is
      if Element(C).Stock < Low then
         -- print a message perhaps
         Put("Low stock of part ");
         Put_Line(Element(C).Part_Number);    -- changed
      end if;
   end Check_It;
end Check_Stock;
The only change is that the call of Key in 
when using the maps package has been replaced by Element. A minor point is that we could avoid calling Element twice by declaring a constant E in Check_It thus 
E: constant Part_Type := Element(C);
and then writing E.Stock < Low and calling Put_Line with E.Part_Number.
A more important point is that if we have instantiated the Generic_Keys inner package as illustrated above then we can leave Check_It unchanged to call Key. But it is important to realise that we are then calling the function Key internal to the instantiation of Generic_Keys (flippantly called Party) and not that from the instantiation of the parent ordered sets package (Store_Sets) because that has no such function. This illustrates the close affinity between the sets and maps packages.
And finally there is a hashed sets package which has strong similarities to both the ordered sets package and the hashed maps package. We can introduce this much as for hashed maps by giving the differences between the two sets packages, the extra facilities in each and the impact on the part number example.
The specification of the hashed sets package starts
   type Element_Type is private;
   with function Hash(Element: Element_Type) return Hash_Type;
   with function Equivalent_Elements(Left, Right: Element_Type)
return Boolean;
   with function "=" (Left, Right: Element_Type) return
Boolean is <>;
package Ada.Containers.Hashed_Sets is
   pragma Preelaborate(Hashed_Sets);
The differences from the ordered sets package are that there is an extra generic parameter Hash and the ordering parameter "<" has been replaced by the function Equivalent_Elements.
So if we have 
function Equivalent_Parts(Left, Right: Part_Type) return Boolean is
   return Left.Part_Number = Right.Part_Number;
end Equivalent_Parts;
function Part_Hash(P: Part_Type) return Hash_Type is
   M31: constant := 2**31–1;    -- a nice Mersenne prime
   return Hash_Type(P.Part_Number) * M31;
end Part_Hash;
(which are very similar to the hashed map example – the only changes are to the parameter type name) then we can instantiate the hashed sets package as follows
package Store_Sets is new Hashed_Sets(
      Element_Type => Part_Type,
      Hash => Part_Hash,
      Equivalent_Elements => Equivalent_Parts);
The_Store: Store_Sets.Set;
and then the rest of our example will be exactly as before. It is thus easy to convert from an ordered set to a hashed set and vice versa provided of course that we only use the facilities common to both.
It should also be mentioned that the inner package Generic_Keys for hashed sets has the following formal parameters 
   type Key_Type(<>) is private;
   with function Key(Element: Element_Type) return Key_Type
   with function Hash(Key: Key_Type) return Hash_Type;
   with function Equivalent_Keys(Left, Right: Key_Type) return Boolean;
package Generic_Keys is
The differences from that for ordered sets are the addition of the function Hash and the replacement of the comparison operator "<" by Equivalent_Keys.
(Incidentally the package Generic_Keys for ordered sets also exports a function Equivalent_Keys for uniformity with the hashed sets package.)
Although our example itself is unchanged we do have to change the instantiation of Generic_Keys thus 
type Part_Key is new Integer;
function Part_No(P: Part_Type) return Part_Key is
   return Part_Key(P.Part_Number);
end Part_No;
function Part_Hash(P: Part_Key) return Hash_Type is
   M31: constant := 2**31–1;    -- a nice Mersenne prime
   return Hash_Type(P) * M31;
end Part_Hash;
function Equivalent_Parts(Left: Right: Part_Key) return Boolean is
   return Left = Right;
end Equivalent_Parts;
and then
package Party is new Generic_Key(
      Key_Type => Part_Key,
      Key => Part_No;
      Hash => Part_Hash
      Equivalent_Keys => Equivalent_Parts);
use Party;
The hash function is similar to that used with hashed maps. The type Part_Key and function Part_No are the same as for ordered sets. We don't really need to declare the function Equivalent_Parts since we could use "=" as the actual parameter for Equivalent_Keys.
We will finish this discussion of sets by briefly considering the additional facilities in the two sets packages (and their inner generic keys packages) just as we did for the two maps packages (the discussion is almost identical).
The ordered sets package has the following additional subprograms 
procedure Delete_First(Container: in out Set);
procedure Delete_Last(Container: in out Set);
function First_Element(Container: Set) return Element_Type;
function Last_Element(Container: Set) return Element_Type;
function Previous(Position: Cursor) return Cursor;
procedure Previous(Position: in out Cursor);
function Floor(Container: Set; Item: Element_Type) return Cursor;
function Ceiling(Container: Set; Item: Element_Type) return Cursor;
function "<" (Left, Right: Cursor) return Boolean;
function ">" (Left, Right: Cursor) return Boolean;
function "<" (Left: Cursor; Right: Element_Type) return Boolean;
function ">" (Left: Cursor; Right: Element_Type) return Boolean;
function "<" (Left: Element_Type; Right: Cursor) return Boolean;
function ">" (Left: Element_Type; Right: Cursor) return Boolean;
procedure Reverse_Iterate(
      Container: in Set;
      Process: not null access procedure (Position: in Cursor));
These are again largely self-evident. The functions Floor and Ceiling are similar to those for ordered maps – Floor searches for the last element which is not greater than Item and Ceiling searches for the first element which is not less than Item – they return No_Element if there is not one.
The functions "<" and ">" are very important for ordered sets. The first is equivalent to 
function "<" (Left, Right: Cursor) return Boolean is
   return Element(Left) < Element(Right);
end "<";
There is a general philosophy that the container packages should work efficiently even if the elements themselves are very large – perhaps even other containers. We should therefore avoid copying elements. (Passing them as parameters is of course no problem since they will be passed by reference if they are large structures.) So in this case the built-in comparison is valuable because it can avoid the copying which would occur if we wrote the function ourselves with the explicit internal calls of the function Element.
On the other hand, there is a general expectation that keys will be small and so there is no corresponding problem with copying keys. Thus such built-in functions are less important for maps than sets but they are provided for maps for uniformity.
The following are additional in the package Generic_Keys for ordered sets
function Equivalent_Keys(Left, Right: Key_Type) return Boolean;
This corresponds to the formal generic parameter of the same name in the package Generic_Keys for hashed sets as mentioned earlier.
function Floor(Container: Set; Key: Key_Type) return Cursor;
function Ceiling(Container: Set; Key: Key_Type) return Cursor;
These are much as the corresponding functions in the parent package except that they use the formal parameter "<" of Generic_Keys for the search.
Hashed sets, like hashed maps also have the facility to specify a capacity as for the vectors package. Thus we have 
procedure Reserve_Capacity(Container: in out Set; Capacity: in Count_Type);
function Capacity(Container: Set) return Count_Type;
The behaviour is much as for vectors and hashed maps. We don't have to set the capacity ourselves since it will be automatically extended as necessary but it might significantly improve performance to do so. Note again that Length(S) cannot exceed Capacity(S) but might be much less.
The other additional subprograms for hashed sets are 
function Equivalent_Elements(Left, Right: Cursor) return Boolean;
function Equivalent_Elements(Left: Cursor; Right: Element_Type) return Boolean;
function Equivalent_Elements(Left: Element_Type; Right: Cursor) return Boolean;
Again, these are very important for sets. The first is equivalent to 
function Equivalent_Elements(Left, Right: Cursor) return Boolean is
   return Equivalent_Elements(Element(Left), Element(Right));
end Equivalent_Elements;
and once more we see that the built-in functions can avoid the copying of the type Element that would occur if we wrote the functions ourselves.

Contents   Index   References   Search   Previous   Next 
© 2005, 2006, 2007 John Barnes Informatics.
Sponsored in part by:
The Ada Resource Association and its member companies: ARA Members AdaCore Polyspace Technologies Praxis Critical Systems IBM Rational Sofcheck and   Ada-Europe: