Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TypeError: Improper input: N=2 must not exceed M=1

I am writing a function to do non-linear curve fitting and am running into this error:

TypeError: Improper input: N=2 must not exceed M=1. 

I don't know why it thinks I am trying to use too large of an array when I am only reading in columns from a csv file.

import math

#stolen sig-fig function <--trust but verify
def round_figures(x, n): 
    return round(x, int(n - math.ceil(math.log10(abs(x))))) 

def try_michaelis_menten_fit( df, pretty=False ):

    # auto-guess
    p0 = ( df['productFinal'].max(), df['substrateConcentration'].mean() )

    popt, pcov = curve_fit( v, df['substrateConcentration'], df['productFinal'], p0=p0 )
    perr = sqrt( diag( pcov ) )

    kcat_km = popt[0] / popt[1]
    # error propegation
    kcat_km_err = (sqrt( (( (perr[0])  / popt[0])**2) + ((  (perr[1])  / popt[1])**2) ))

    kcat = ( popt[0] )
    kcat_std_err = ( perr[0] )

    km_uM = ( popt[1] * 1000000 )
    km_std_err = ( perr[1] *1000000)


    if pretty:



        results = { 

        'kcat': round_figures(kcat, 3),
        'kcat_std_err': round_figures(kcat_std_err, 3),

        'km_uM': round_figures(km_uM, 5),
        'km_std_err': round_figures(km_std_err, 3),

        'kcat/km': round_figures(kcat_km, 2),
        'kcat/km_err': round_figures(kcat_km_err, 2),

        }

        return pandas.Series( results )
    else: 
        return popt, perr 

df = pandas.read_csv( 'PNP_Raw2Fittr.csv' ) 



fits = df.groupby('sample').apply( try_michaelis_menten_fit, pretty=True ) 
fits.to_csv( 'fits_pretty_output.csv' )
print( fits ) 

I am reading in a data frame that is an expanded version of something like this:

   sample   yield    dilution  time  productAbsorbance  substrateConcentration  internalStandard  
0  PNPH_I_4  2.604     10000  2400              269.6                0.007000   2364.0
1  PNPH_I_4  2.604     10000  2400              215.3                0.002333   2515.7
2  PNPH_I_4  2.604     10000  2400              160.3                0.000778   2252.2
3  PNPH_I_4  2.604     10000  2400              104.1                0.000259   2302.4
4  PNPH_I_4  2.604     10000  2400               60.9                0.000086   2323.5
5  PNPH_I_4  2.604     10000  2400               35.4                0.000029   2367.9
6  PNPH_I_4  2.604     10000  2400                0.0                0.000000   2165.3

When I call this function on this smaller version of my data frame it seems to work, but when I use it on the large one I get this error. This error began when I added the internalStandard column and worked perfectly before that. To make matters even more confusing, when I revert back to old code with an old version of the data frame it works fine, however if I add that line I get the error as would be expected, HOWEVER, when i delete the same line in my data frame and run the code again I STILL get the same error!

I have figured out that I pass in method='trf' instead of lm for my optimization method I instead get the error OverflowError: cannot convert float infinity to integer, however I do use the df.dropna(inplace=True), is there a similar method that is specific for infinity?

like image 552
rjboyd00 Avatar asked Feb 08 '23 09:02

rjboyd00


1 Answers

I believe this error is referring to the fact that the length of your x and y (e.g. df['substrateConcentration'] and df['productFinal']) input data is less than the number of fitting parameters that are given to curve_fit, as defined in your fitting function v. This is a consequence of the mathematics; attempting to perform curve fitting (optimization) with too few constraints.

I reproduced the same error with scipy.optimize.curve_fit by providing a fit function that expects 4 fitting parameters with an array of shape (2,).

e.g.

import numpy as np
from scipy.optimize import curve_fit

x, y = np.array([0.5, 4.0]), np.array([1.5, 0.6])

def func(x, a, b, c, d):
    return a*x**3. + b*x**2. - c/x + d

popt, pcov = curve_fit(func, x, y)

TypeError: Improper input: N=4 must not exceed M=2

However, since you have not provided your fit function v in the question it is not possible to confirm that this is the specific cause of your problem.

Maybe your input data is not being formatted exactly the way you think it is. I suggest that you check how your arrays look when they are being passed to curve_fit. You might be parsing the data wrongly so that the number of rows ends up being very small.

I have figured out that I pass in method='trf' instead of lm for my optimization method I instead get the error OverflowError: cannot convert float infinity to integer, however I do use the df.dropna(inplace=True), is there a similar method that is specific for infinity?

Yes, so different methods for the optimization check the input data differently and throw different errors. This suggests, again, that there is some kind of problem with your input data. The first method is probably rejecting (ignoring) those rows that 'trf' is throwing this error for, and perhaps ending up with no rows at all.

like image 172
feedMe Avatar answered Feb 12 '23 11:02

feedMe