MATLAB: Find abbreviated version of matrix that minimises sum of matrix elements

Tags:

I have a 151-by-151 matrix A. It's a correlation matrix, so there are 1s on the main diagonal and repeated values above and below the main diagonal. Each row/column represents a person.

For a given integer n I will seek to reduce the size of the matrix by kicking people out, such that I am left with a n-by-n correlation matrix that minimises the total sum of the elements. In addition to obtaining the abbreviated matrix, I also need to know the row number of the people who should be booted out of the original matrix (or their column number - they'll be the same number).

As a starting point I take A = tril(A), which will remove redundant off-diagonal elements from the correlation matrix.

Correlation matrix

So, if n = 4 and we have the hypothetical 5-by-5 matrix above, it's very clear that person 5 should be kicked out of the matrix, since that person is contributing a lot of very high correlations.

It's also clear that person 1 should not be kicked out, since that person contributes a lot of negative correlations, and thus brings down the sum of the matrix elements.

I understand that sum(A(:)) will sum everything in the matrix. However, I'm very unclear about how to search for the minimum possible answer.

I noticed a similar question Finding sub-matrix with minimum elementwise sum, which has a brute force solution as the accepted answer. While that answer works fine there it's impractical for a 151-by-151 matrix.

EDIT: I had thought of iterating, but I don't think that truly minimizes the sum of elements in the reduced matrix. Below I have a 4-by-4 correlation matrix in bold, with sums of rows and columns on the edges. It's apparent that with n = 2 the optimal matrix is the 2-by-2 identity matrix involving Persons 1 and 4, but according to the iterative scheme I would have kicked out Person 1 in the first phase of iteration, and so the algorithm makes a solution that is not optimal. I wrote a program that always generated optimal solutions, and it works well when n or k are small, but when trying to make an optimal 75-by-75 matrix from a 151-by-151 matrix I realised my program would take billions of years to terminate.

I vaguely recalled that sometimes these n choose k problems can be resolved with dynamic programming approaches that avoid recomputing things, but I can't work out how to solve this, and nor did googling enlighten me.

I'm willing to sacrifice precision for speed if there's no other option, or the best program will take more than a week to generate a precise solution. However, I'm happy to let a program run for up to a week if it will generate a precise solution.

If it's not possible for a program to optimise the matrix within an reasonable timeframe, then I would accept an answer that explains why n choose k tasks of this particular sort can't be resolved within reasonable timeframes.

4x4 correlation matrix

251

asked Nov 16 '15 15:11

user1205901 - Слава Україні

1 Answers

This is an approximate solution using a genetic algorithm.

I started with your test case:

data_points = 10; % How many data points will be generated for each person, in order to create the correlation matrix.
num_people = 25; % Number of people initially.
to_keep = 13; % Number of people to be kept in the correlation matrix.
to_drop = num_people - to_keep; % Number of people to drop from the correlation matrix.
num_comparisons = 100; % Number of times to compare the iterative and optimization techniques.
for j = 1:data_points
    rand_dat(j,:) = 1 + 2.*randn(num_people,1); % Generate random data.
end
A = corr(rand_dat);

then I defined the functions you need to evolve the genetic algorithm:

function individuals = user1205901individuals(nvars, FitnessFcn, gaoptions, num_people)

individuals = zeros(num_people,gaoptions.PopulationSize);
for cnt=1:gaoptions.PopulationSize
    individuals(:,cnt)=randperm(num_people);
end

individuals = individuals(1:nvars,:)';

is the individual generation function.

function fitness = user1205901fitness(ind, A)

fitness = sum(sum(A(ind,ind)));

is the fitness evaluation function

function offspring = user1205901mutations(parents, options, nvars, FitnessFcn, state, thisScore, thisPopulation, num_people)

offspring=zeros(length(parents),nvars);
for cnt=1:length(parents)
    original = thisPopulation(parents(cnt),:);
    extraneus = setdiff(1:num_people, original);
    original(fix(rand()*nvars)+1) = extraneus(fix(rand()*(num_people-nvars))+1);
    offspring(cnt,:)=original;
end

is the function to mutate an individual

function children = user1205901crossover(parents, options, nvars, FitnessFcn, unused, thisPopulation)

children=zeros(length(parents)/2,nvars);
cnt = 1;
for cnt1=1:2:length(parents)
    cnt2=cnt1+1;
        male = thisPopulation(parents(cnt1),:);
        female = thisPopulation(parents(cnt2),:);
        child = union(male, female);
        child = child(randperm(length(child)));
        child = child(1:nvars);
        children(cnt,:)=child;
        cnt = cnt + 1;

end

is the function to generate a new individual coupling two parents.

At this point you can define your problem:

gaproblem2.fitnessfcn=@(idx)user1205901fitness(idx,A)
gaproblem2.nvars = to_keep
gaproblem2.options = gaoptions()
gaproblem2.options.PopulationSize=40
gaproblem2.options.EliteCount=10
gaproblem2.options.CrossoverFraction=0.1
gaproblem2.options.StallGenLimit=inf
gaproblem2.options.CreationFcn= @(nvars,FitnessFcn,gaoptions)user1205901individuals(nvars,FitnessFcn,gaoptions,num_people)
gaproblem2.options.CrossoverFcn= @(parents,options,nvars,FitnessFcn,unused,thisPopulation)user1205901crossover(parents,options,nvars,FitnessFcn,unused,thisPopulation)
gaproblem2.options.MutationFcn=@(parents, options, nvars, FitnessFcn, state, thisScore, thisPopulation) user1205901mutations(parents, options, nvars, FitnessFcn, state, thisScore, thisPopulation, num_people)
gaproblem2.options.Vectorized='off'

open the genetic algorithm tool

gatool

from the File menu select Import Problem... and choose gaproblem2 in the window that opens.

Now, run the tool and wait for the iterations to stop.

The gatool enables you to change hundreds of parameters, so you can trade speed for precision in the selected output.

The resulting vector is the list of indices that you have to keep in the original matrix so A(garesults.x,garesults.x) is the matrix with only the desired persons.

answered Sep 19 '22 02:09

NicolaSysnet

Related questions
                            
                                Can anyone with access to the new "Matlab Coder" product show some output of the translation to C?
                            
                                MATLAB equivalent to Python's string.join()
                            
                                Speeding up linear interpolation of many pixel locations in NumPy
                            
                                Function calling variable output number
                            
                                MATLAB: Automatic resizing of GUI components/fonts
                            
                                Getting text from a control in another application, using MATLAB
                            
                                Equivalent of Matlab 'ismember' in numpy (Python)? [duplicate]
                            
                                How to call MATLAB from command-line and print to stdout before exiting
                            
                                Why does the Matlab Profiler say there is a bottleneck on the 'end' statement of a 'for' loop?
                            
                                How to fit more than one line to data points
                            
                                Principal Component Analysis (PCA) on huge sparse dataset
                            
                                Quickly and efficiently calculating an eigenvector for known eigenvalue
                            
                                An efficient code to determine if a set is a subset of another set
                            
                                Matlab function handle workspace shenanigans
                            
                                Matlab: defining a function handle catching second returned value of a function
                            
                                Using boost in MATLAB MEX library, different from MATLAB's version
                            
                                How to get R data into a matlab matrix
                            
                                Efficient Array Preallocation in MATLAB
                            
                                How MATLAB code generation infers output size with nested branches
                            
                                The equivalent function of Matlab imfilter in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MATLAB: Find abbreviated version of matrix that minimises sum of matrix elements

Tags:

optimization

matlab

quadratic-programming

user1205901 - Слава Україні

People also ask

1 Answers

NicolaSysnet

Recent Activity

Donate For Us