Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl, generating new data (new hash) using two different hash tables

I've bumped into a very complicated problem (in my perspective as a newbie) and I'm not sure how to solve it. I can think of the workflow but not the script.

I have file A that looks like the following: Teacher (tab) Student1(space)Student2(space)..

Fiona       Nicole Sherry 
James       Alan Nicole
Michelle    Crystal 
Racheal     Bobby Dan Nicole

They sometimes have numbers right next to their names when there are two of the same name (ex, John1, John2). Students may also overlap if they have more than two advisors..

File B is a file that has groups of teachers together. It looks similar but the values are comma-delimited.

Fiona       Racheal,Jack
Michelle    Racheal
Racheal     Fiona,Michelle
Jack        Fiona

The trend in file B is that a key has multiple values and each value becomes a key as well to easily find who is grouped with who.

The output I would like is which students will be likely to receive similar education based on their teacher/groups.So I would like the script to do the following:

  1. Store file A into a hash and close
  2. Open file B, go through each teacher to see if they have students (some may not, the actual list is quite big..). So if I take the first teacher, Fiona, it will look in stored file A hash table to see if there is a Fiona. If there is, (in this case, Nicole and Sherry), pop them each as new keys to a new hash table.

    while (<Group>) {
        chomp;
        $data=$_;
        $data=~/^(\S+)\s+(.*)$/;
        $TeacherA=$1;
        $group=$2; 
    
  3. Then, look at the group of teachers who are grouped with Fiona (Racheal, Jack). Take 1 person at a time (Racheal)

    if (defined??) {
        while ($list=~/(\w+)(.*)/) {
            $TeacherB=$1;
            $group=$2;
    
  4. Look at file A for Racheal's students.
  5. Fill them as values (comma-delimited) for student keys made from step 2.
  6. Print student-student and teacher-teacher group.

    Nicole  Bobby,Dan,Nicole    Fiona   Racheal
    Sherry  Bobby,Dan,Nicole    Fiona   Racheal
    

    Since the next teacher in Fiona's group, Jack, didn't have any students, he would not be in this results. If he had, for example, David, the results would be:

    Nicole  Bobby,Dan,Nicole    Fiona   Racheal
    Sherry  Bobby,Dan,Nicole    Fiona   Racheal
    Nicole  David               Fiona   Jack
    Sherry  David               Fiona   Jack
    

I'm so sorry for asking such a complicated and specific question. I hope other people who are doing something like this by any chance may benefit from the answers. Thank you so much for your help and reply. You are my only source of help.

like image 870
absolutenewbie Avatar asked Nov 23 '25 14:11

absolutenewbie


2 Answers

This is a rather strange way to look at the data, but I think I got it to work the way you tried. It would be interesting to see why you want the data to be that way. Maybe provide column headings next time. Knowing why you do something in a certain way often makes it a lot easier to think of ways to achive it imo.

So here's what I did. Don't get confused, I put your values from file A and file B into scalars and changed the part about reading them.

my $file_a = qq~Fiona\tNicole Sherry
James\tAlan Nicole
Michelle\tCrystal
Racheal\tBobby Dan Nicole
~;

my $file_b = qq~Fiona\tRacheal,Jack
Michelle\tRacheal
Racheal\tFiona,Michelle
Jack\tFiona
~;

After that, proceed to read the 'files'.

# 1: Store file A in a hash
my (%file_a);
foreach my $a (split /\n/, $file_a) {
  my @temp = split /\t/, $a;
  $file_a{$temp[0]} = $temp[1];
}

# 2: Go through file B
foreach my $b (split /\n/, $file_b) {
  my @line_b = split /\t/, $b;
  # Look in stored file A if the teacher is there
  if (exists $file_a{$line_b[0]}) {
    my (%new_hash_table, @teachers);
    # Put all the students of this teacher into a new hash
    $new_hash_table{$_} = '' foreach split / /, $file_a{$line_b[0]};

    # 3: Take one of the group of teachers who are grouped with the 
    # current teacher at a time
    foreach my $teacher (split /,/, $line_b[1]) {
      if (exists $file_a{$teacher}) {
        # 4: This teacher from the group has students listen in file A
        push @teachers, $teacher; # Store the teacher's name for print later
        foreach (keys %new_hash_table) {
          # 5: Fill the students as csv for the student keys from step 2
          $new_hash_table{$_} = join(',', split(/ /, $file_a{$teacher}));
        }
      }
    }
    foreach my $student (keys %new_hash_table) {
      # 6: Print...        
      print join("\t", 
        # Student-student relation
        $student, $new_hash_table{$student}, 
        # Teacher-teacher relation
        $line_b[0], @teachers);
      print "\n";
    }
  }
}

For me that provides the following output:

Sherry  Bobby,Dan,Nicole    Fiona   Racheal
Nicole  Bobby,Dan,Nicole    Fiona   Racheal
Crystal Bobby,Dan,Nicole    Michelle    Racheal
Bobby   Crystal Racheal Fiona   Michelle
Nicole  Crystal Racheal Fiona   Michelle
Dan Crystal Racheal Fiona   Michelle

This is probably weird since I don't have all the values.

Anyways, there are a few things to be said to this.

In your example code you used a regex like $data=~/^(\S+)\s+(.*)$/; to get to the values of a simple two-column list. It is a lot easier to use the split operator to do that.

When you read from a file with the <FILEHANDLE> syntax, you can put the scalar you want your lines to go into in the while loop's condition like so:

while (my $data = <GROUP>) {
      chomp $data

Also it is common to write filehandle names in all-caps.

I'd suggest you take a look at the 'Learning Perl'. The basic concepts of hashes and arrays in there should be enough to takle tasks like this one. Hope this helps.

like image 154
simbabque Avatar answered Nov 25 '25 11:11

simbabque


I can't imagine why you would want this redundant data when you could just look at file A to get a good idea of who was getting a similar education ... but here is a way of doing it in perl all the same.

$data = {};
# pull in students
open(IN, "students.txt");
while(my $line = <IN>) {
  chomp($line);
  my ($teacher, @students) = split(/\s+/,$line);
  $data->{$teacher}->{students} = \@students;
}
close IN;
# pull in teachers
open(IN, "teachers.txt");
while(my $line = <IN>) {
  chomp($line);
  my ($teacher, $supporters) = split(/\s+/,$line);
  my @supporters = split(/,/,$supporters);
  $data->{$teacher}->{supporters} = \@supporters;
}
close IN;
# make the output
foreach my $teacher (keys %{$data}){
  foreach my $teacher_student (@{$data->{$teacher}->{students}}) {
    foreach my $supporter (@{$data->{$teacher}->{supporters}}){
      my $num_supporter_students = @{$data->{$supporter}->{students}} + 0;
      if($num_supporter_students) {

        print "$teacher_student\t" . 
              join(",",@{$data->{$supporter}->{students}}) .
              "\t$teacher\t$supporter\n";
      }
    }
  }
}

When run on the data listed in the question it returns:

Crystal Bobby,Dan,Nicole    Michelle    Racheal
Nicole  Bobby,Dan,Nicole    Fiona   Racheal
Sherry  Bobby,Dan,Nicole    Fiona   Racheal
Bobby   Nicole,Sherry   Racheal Fiona
Bobby   Crystal Racheal Michelle
Dan Nicole,Sherry   Racheal Fiona
Dan Crystal Racheal Michelle
Nicole  Nicole,Sherry   Racheal Fiona
Nicole  Crystal Racheal Michelle
like image 37
zortacon Avatar answered Nov 25 '25 09:11

zortacon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!