I'm writing code in a C# library to do clustering on a (two-dimensional) dataset - essentially breaking the data up into groups or clusters. To be useful, the library needs to take in "generic" or "custom" data, cluster it, and return the clustered data.
To do this, I need to assume that each datum in the dataset being passed in has a 2D vector associated with it (in my case Lat
, Lng
- I'm working with co-ordinates).
My first thought was to use generic types, and pass in two lists, one list of the generic data (i.e. List<T>
) and another of the same length specifying the 2D vectors (i.e. List<Coordinate>
, where Coordinate
is my class for specifying a lat, lng pair), where the lists correspond to each other by index. But this is quite tedious because it means that in the algorithm I have to keep track of these indices somehow.
My next thought was to use inferfaces, where I define an interface
public interface IPoint
{
double Lat { get; set; }
double Lng { get; set; }
}
and ensure that the data that I pass in implements this interface (i.e. I can assume that each datum passed in has a Lat
and a Lng
).
But this isn't really working out for me either. I'm using my C# library to cluster stops in a transit network (in a different project). The class is called Stop
, and this class is also from an external library, so I can't implement the interface for that class.
What I did then was inherit from Stop
, creating a class called ClusterableStop
which looks like this:
public class ClusterableStop : GTFS.Entities.Stop, IPoint
{
public ClusterableStop(Stop stop)
{
Id = stop.Id;
Code = stop.Code;
Name = stop.Name;
Description = stop.Description;
Latitude = stop.Latitude;
Longitude = stop.Longitude;
Zone = stop.Zone;
Url = stop.Url;
LocationType = stop.LocationType;
ParentStation = stop.ParentStation;
Timezone = stop.Timezone;
WheelchairBoarding = stop.WheelchairBoarding;
}
public double Lat
{
get
{
return this.Latitude;
}
}
public double Lng
{
get
{
return this.Longitude;
}
}
}
which as you can see implements the IPoint
interface. Now I use the constructor for ClusterableStop
to first convert all Stop
s in the dataset to ClusterableStop
s, then run the algorithm and get the result as ClusterableStop
s.
This isn't really what I want, because I want to do things to the Stop
s based on what cluster they fall in. I can't do that because I've actually instantiated new stops, namely ClusterableStop
s !!
I can still acheive what I want to, because e.g. I can retrieve the original objects by Id. But surely there is a much more elegant way to accomplish all of this? Is this the right way to be using interfaces? It seemed like such a simple idea - passing in and getting back custom data - but turned out to be so complicated.
Since an interface provides all of that, you can call methods on it, just as you can on a regular class. Of course, in order for the method to actually return some object, there needs to be some class that implements that interface somewhere.
To declare a class that implements an interface, you include an implements clause in the class declaration. Your class can implement more than one interface, so the implements keyword is followed by a comma-separated list of the interfaces implemented by the class.
An interface name can also be used as a return type but the returned object must implement methods of that interface. The following Java program shows the implementation of a class name as a return type.
Yes, you can pass Interface as a parameter in the function.
Since all you need is to associate a (latitude, longitude) pair to each element of 2D array, you could make a method that takes a delegate, which produces an associated position for each datum, like this:
ClusterList Cluster<T>(IList<T> data, Func<int,Coordinate> getCoordinate) {
for (int i = 0 ; i != data.Count ; i++) {
T item = data[i];
Coordinate coord = getCoord(i);
...
}
}
It is now up to the caller to decide how Coordinate
is paired with each element of data.
Note that the association by list position is not the only option available to you. Another option is to pass a delegate that takes the item, and returns its coordinate:
ClusterList Cluster<T>(IEnumerable<T> data, Func<T,Coordinate> getCoordinate) {
foreach (var item in data) {
Coordinate coord = getCoord(item);
...
}
}
Although this approach is better than the index-based one, in cases when the coordinates are not available on the object itself, it requires the caller to keep some sort of an associative container on T
, which must either play well with hash-based containers, or be an IComparable<T>
. The first approach places no restrictions on T
.
In your case, the second approach is preferable:
var clustered = Cluster(
myListOfStops
, stop => new Coordinate(stop.Latitude, stop.Longitude)
);
Have you considered using Tuples to do the work - sometimes this is a useful way of associating two classes without creating a whole new class. You can create a list of tuples:
List<Tuple<Point, Stop>>
where Point is the thing you cluster on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With