Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Backward Propagation - Gradient error [Python]

I am working through Andrew Ng new deep learning Coursera course, week2.

We are supposed to implement a logistic regression algorithm.
I am stuck at gradient code ( dw ) - giving me a syntax error.

The algorithm is as follows:

import numpy as np

def propagate(w, b, X, Y):
    m = X.shape[1]

    A = sigmoid(np.dot(w.T,X) + b )  # compute activation
    cost = -(1/m)*(np.sum(np.multiply(Y,np.log(A)) + np.multiply((1-Y),np.log(1-A)), axis=1)    

    dw =(1/m)*np.dot(X,(A-Y).T)
    db = (1/m)*(np.sum(A-Y))
    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())

    grads = {"dw": dw,
             "db": db}

    return grads, cost

Any ideas why I keep on getting this syntax error?

File "<ipython-input-1-d104f7763626>", line 32
    dw =(1/m)*np.dot(X,(A-Y).T)
     ^
SyntaxError: invalid syntax
like image 973
kyttcar Avatar asked Aug 15 '17 11:08

kyttcar


1 Answers

Andrew NG is inspirative, out of question
however one may do a few steps for a better code-design:

Hint 1:
if serious into maintaining larger code-bases, start using some better IDE, where both
(a) parentheses-matching with GUI-highlighting, and
(b) jump-to-matching parenthesis KBD-shortcut are supported

    cost = - ( 1 / m ) * ( np.sum(   np.multiply(       Y,   np.log(     A ) )
                                   + np.multiply( ( 1 - Y ), np.log( 1 - A ) ),
                                   axis = 1
                                   )
                           ) # missing-parenthesis

Hint 2:
after just all syllabus tasks got auto-graded, try to improve your code performance - not all steps are performance optimised, which was forgiving for small-scale learning-tasks, while which may kill your approach once scaled to larger N in O( N^k ) in both [PTIME,PSPACE] dimensions.

What seemed to work tolerably-enough for 1E+3, fails to serve 1E+6 or 1E+9 examples to train, the less if some ML-pipeline iterates on ML-models' HyperPARAMETERs' [EXPTIME,EXPSPACE]-search domains. That hurts. Then one starts to more carefully craft code for tradeoffs, paid in excessive [PTIME] and [EXPTIME]-costs, once a [PSPACE]-problem size does not fit into computing-infrastructure in-RAM handling.

Where?

-- avoid duplicate calculations of the same thing, the more if arrays are taken into account
( in all iterative methods, the more in ML-pipelines + ML-model-HyperPARAMETERs' vast, indeed VAST SPACE-searches
each wasted [ns] soon grows into accumulated [us], if not [ms],
each wasted [ms] soon grows into accumulated [s], if not tens of [min],
each wasted [min] soon grows into accumulated [hrs], if not [days] ... yes, can loose days on poor code-design )

Examples:

# here, A[] is .ALLOC'd + .SET -----------------------------[PTIME]
A = sigmoid( np.dot( w.T, X )
           + b
             )  # compute activations, .SET in A[]
# ----------------------------------------------------------[PTIME]-cost was paid
cost = -( 1 / m ) * ( np.sum(   np.multiply(      Y,   np.log(     A ) )
                              + np.multiply( (1 - Y ), np.log( 1 - A ) ),
                                axis = 1
                                )
                      )
# ----------------------------------------------------------[PTIME]-cost again?
dw =  ( 1 / m ) *   np.dot( X, ( A - Y ).T )    # consumes ( A - Y )
db =  ( 1 / m ) * ( np.sum(      A - Y )   )    # consumes ( A - Y ) again
# ----------------------------------------------# last but not least,
#                                               # A[] is not consumed
#                                               #     till EoFun/return
# a better approach is to use powerful + faster [PTIME] numpy in-place operations
# that also avoid additional dynamic allocation [PSPACE] -> saving more [PTIME]
DIV_byM = 1 / m                                 # re-use O(N^2) times
A      -= Y                                     # way faster in-place + re-used
# ----------------------------------------------# [PTIME]-cost avoided 2x
dw      = np.dot( X, A.T )                      #        +1st re-use
dw     *= DIV_byM                               # way faster in-place

assert( dw.shape == w.shape and "INF: a schoolbook assert()-ion test, "
                            and "of not much value in PRODUCTION-code"
                            )
return { 'dw': dw,                              
         'db': DIV_byM * np.sum( A )            #        +2nd re-use
          }                                     # MUCH better to design
#                                               # the whole as in-place mods
#                                               # of static .ALLOC'd np.view-s,
#                                               # instead of new dict()-s

[TEST-ME] is a good design practice, but
[PERF-ME] Scaling matters more for mature code - a good practice for evaluation:

A good engineering practice is to benchmark one's own code against some realistic operational state / conditions.

Given deep-learning was used, one may assume a set of scaling horizons -- a ~ 20 M neurons, a ~ 30 M neurons -- to benchmark and self-document the code-execution times:

        """                                                            __doc__
        USAGE:      ...
        PARAMETERS: ...
        ...
        EXAMPLE:    nnFeedFORWARD( X_example, nnMAP, thetaVEC, stateOfZ, stateOfA )

        [TEST-ME]   ...
        [PERF-ME]   *DO NOT* numba.jit( nnFeedFORWARD, nogil = True ) as it performs worse than with plain numpy-OPs

                        ~ 500 .. 1200 [us / 1E6 theta-s .dot() ] on pre-prepared np.view()-s
                        ~ 500 .. 1200 [us / 1E6 theta-s *= 0.  ] on pre-prepared np.view()-s
            ############################################################
            #
            # as-is:    ~   9 [ms / 21M Theta-s .dot() ] on pre-prepared np.view()-s for MAT + INCL. np.random/rand( 1000 ) ~~ 40 [us]
                              [  /  10k NEURONs tanh() ] in  5 LAYERs
                        ~  14 [ms / 30M Theta-s .dot() ]
                              [  /  17k NEURONs tanh() ] in 10 LAYERs
                        >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * ~ 1E6 iterations in { .minimize() | .fmin_l_bfgs_b() }
                        ~   4 [hrs / 1E6 iterations ] w/o backprop
like image 118
user3666197 Avatar answered Sep 28 '22 15:09

user3666197