I've successfully implemented a java program that uses two common data structures: a Tree
and a Stack
along with an interface that allows a user to enter in a tree node ID and get information about it in relation to its parent. You can look at the latest version of this program here at my GitHub src for this program
Background
This ad hoc program I wrote is used to study the evolution of gene flow across hundreds of organisms by comparing data in a file that consists of: FeatureIDs = String
primitives (further down these are listed in the first column as "ATM-0000011"
, "ATM-0000012"
, and so on), and consists of the scores that are associated with their presence or absence at a particular node in the tree and these are double
primitives.
Here is what the data file looks like:
"FeatureID","112","115","120","119","124",...//this line has all tree node IDs
"ATM-0000011",2.213e-03,1.249e-03,7.8e-04,9.32e-04,1.472e-03,... //scores on these lines
"ATM-0000012",2.213e-03,1.249e-03,7.8e-04,9.32e-04,1.472e-03,...//correspond to node ID
"ATM-0000013",0.94,1.249e-03,7.8e-04,9.32e-04,1.472e-03,...//order in the first line
... //~30000 lines later
"ATM-0036186",0.94,0.96,0.97,0.95,0.95,...
The Problem
Previously, it was good enough to just make a 2D array of the doubles from the data file (the array excluded the first line in the file and the FeatureIDs, because they're Strings), and use the 2D array to then make double
stacks. The stacks would be made for parent and child nodes as determined by user input and the Tree
.
The data in the parent and child stacks would then be popped off at the same time (thus ensuring that the same FeatureIDs were being compared without actually having to include that data in the DS) and have their values compared based on whether they met a defined condition (ie. if both values were >= 0.75). Iff they did, a counter would be incremented. Once the comparisons were finished (stacks were empty) the program would return the count(s).
Now what I want to do instead of just counting, is make a list(s) of which FeatureIDs met the comparison criteria. So instead of returning the counter that says there were 4100 FeatureIDs between node A and node B that met the criteria, I want a list of all 4100 FeatureID Strings
that met the criteria being compared between node A and node B. I'm going to save that list as a file later but that's not of concern here. This means that I'll probably have to abandon the double
2D array/double
stack scheme which had previously worked so well.
The Question
Knowing what the problem is, is there a clever fix to this problem where I could make a change to the input data file, or somewhere in my code (tlacMain.java), without adding much more data to the process? I just need ideas.
I'm not quite sure if I understand your question correctly, but instead of incrementing a counter you could just add the currently compared FeatureID to an ArrayList and later write that to a file.
If you need a List for every comparison you could have something like HashMap<Comparison, ArrayList<String>>
.
edit: I read your comment and tried to come up with a solution without changing too much:
String[] firstLine = sc.nextLine().split(regex);
//line is the line of input being read in thru the inputFile
int line = 0;
//array of doubles will hold the data to be put in the stacks
double [][] theData = new double [28420][firstLine.length];
while(sc.hasNext())
{
String lineIn = sc.nextLine();
String[] lineInAsString = lineIn.split(regex);
for(int i = 1; i < lineInAsString.length; i++)
{
theData[line][i] = Double.parseDouble(lineInAsString[i]);
}
line++;
}
sc.close();
return theData;
In this part of your getFile()
function, you read the csv into a double matrix. For each column i
in the matrix we need also the corresponding featureID. To return both the doubles matrix and a list with featureIDs, you need a container class.
class DataContainer {
public double[][] matrix;
public int[] featureIds;
public DataContainer(double[][] matrix, int[] featureIds) {
this.matrix = matrix;
this.featureIds = featureIds;
}
}
Now we can change the code above to return both.
String[] firstLine = sc.nextLine().split(regex);
// array of ids
int[] featureIds = new int[firstLine.length];
for(int i = 1; i < lineInAsString.length; i++)
{
featureIds[i] = Integer.parseInt(firstLine[i]);
}
// ... same stuff as before
return new DataContainer(newMatrix, featureIds);
In your main function you can now extract both structures. So instead of
double newMatrix[][] = getFile(args);
you can write
DataContainer data = getFile(args);
double[][] newMatrix = data.matrix;
int[] featureIds = data.featureIds;
You can now use the featureIds array to match it up with your matrix columns in your calculations. Instead of incrementing an int
inside addedInternal
, you can create an ArrayList<Integer>
and add(id)
for every match. Then return the ArrayList
, so you can use it for reporting outside of that function.
ArrayList<Integer> addedFeatureIds = addedInternal(parentStackOne, childStackOne, featureIdStack);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With