I am working through Andrew Ng new deep learning Coursera course, week2.
We are supposed to implement a logistic regression algorithm.
I am stuck at gradient code ( dw
) - giving me a syntax error.
The algorithm is as follows:
import numpy as np
def propagate(w, b, X, Y):
m = X.shape[1]
A = sigmoid(np.dot(w.T,X) + b ) # compute activation
cost = -(1/m)*(np.sum(np.multiply(Y,np.log(A)) + np.multiply((1-Y),np.log(1-A)), axis=1)
dw =(1/m)*np.dot(X,(A-Y).T)
db = (1/m)*(np.sum(A-Y))
assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())
grads = {"dw": dw,
"db": db}
return grads, cost
Any ideas why I keep on getting this syntax error?
File "<ipython-input-1-d104f7763626>", line 32
dw =(1/m)*np.dot(X,(A-Y).T)
^
SyntaxError: invalid syntax
Hint 1:
if serious into maintaining larger code-bases, start using some better IDE, where both
(a) parentheses-matching with GUI-highlighting, and
(b) jump-to-matching parenthesis KBD-shortcut are supported
cost = - ( 1 / m ) * ( np.sum( np.multiply( Y, np.log( A ) )
+ np.multiply( ( 1 - Y ), np.log( 1 - A ) ),
axis = 1
)
) # missing-parenthesis
Hint 2:
after just all syllabus tasks got auto-graded, try to improve your code performance - not all steps are performance optimised, which was forgiving for small-scale learning-tasks, while which may kill your approach once scaled to larger N
in O( N^k )
in both [PTIME,PSPACE]
dimensions.
What seemed to work tolerably-enough for 1E+3, fails to serve 1E+6 or 1E+9 examples to train, the less if some ML-pipeline iterates on ML-models' HyperPARAMETERs' [EXPTIME,EXPSPACE]
-search domains. That hurts. Then one starts to more carefully craft code for tradeoffs, paid in excessive [PTIME]
and [EXPTIME]
-costs, once a [PSPACE]
-problem size does not fit into computing-infrastructure in-RAM handling.
-- avoid duplicate calculations of the same thing, the more if arrays are taken into account
( in all iterative methods, the more in ML-pipelines + ML-model-HyperPARAMETERs' vast, indeed VAST SPACE-searches
each wasted [ns] soon grows into accumulated [us], if not [ms],
each wasted [ms] soon grows into accumulated [s], if not tens of [min],
each wasted [min] soon grows into accumulated [hrs], if not [days] ... yes, can loose days on poor code-design )
# here, A[] is .ALLOC'd + .SET -----------------------------[PTIME]
A = sigmoid( np.dot( w.T, X )
+ b
) # compute activations, .SET in A[]
# ----------------------------------------------------------[PTIME]-cost was paid
cost = -( 1 / m ) * ( np.sum( np.multiply( Y, np.log( A ) )
+ np.multiply( (1 - Y ), np.log( 1 - A ) ),
axis = 1
)
)
# ----------------------------------------------------------[PTIME]-cost again?
dw = ( 1 / m ) * np.dot( X, ( A - Y ).T ) # consumes ( A - Y )
db = ( 1 / m ) * ( np.sum( A - Y ) ) # consumes ( A - Y ) again
# ----------------------------------------------# last but not least,
# # A[] is not consumed
# # till EoFun/return
# a better approach is to use powerful + faster [PTIME] numpy in-place operations
# that also avoid additional dynamic allocation [PSPACE] -> saving more [PTIME]
DIV_byM = 1 / m # re-use O(N^2) times
A -= Y # way faster in-place + re-used
# ----------------------------------------------# [PTIME]-cost avoided 2x
dw = np.dot( X, A.T ) # +1st re-use
dw *= DIV_byM # way faster in-place
assert( dw.shape == w.shape and "INF: a schoolbook assert()-ion test, "
and "of not much value in PRODUCTION-code"
)
return { 'dw': dw,
'db': DIV_byM * np.sum( A ) # +2nd re-use
} # MUCH better to design
# # the whole as in-place mods
# # of static .ALLOC'd np.view-s,
# # instead of new dict()-s
[TEST-ME]
is a good design practice, but[PERF-ME]
Scaling matters more for mature code - a good practice for evaluation:A good engineering practice is to benchmark one's own code against some realistic operational state / conditions.
Given deep-learning was used, one may assume a set of scaling horizons -- a ~ 20 M neurons, a ~ 30 M neurons -- to benchmark and self-document the code-execution times:
""" __doc__
USAGE: ...
PARAMETERS: ...
...
EXAMPLE: nnFeedFORWARD( X_example, nnMAP, thetaVEC, stateOfZ, stateOfA )
[TEST-ME] ...
[PERF-ME] *DO NOT* numba.jit( nnFeedFORWARD, nogil = True ) as it performs worse than with plain numpy-OPs
~ 500 .. 1200 [us / 1E6 theta-s .dot() ] on pre-prepared np.view()-s
~ 500 .. 1200 [us / 1E6 theta-s *= 0. ] on pre-prepared np.view()-s
############################################################
#
# as-is: ~ 9 [ms / 21M Theta-s .dot() ] on pre-prepared np.view()-s for MAT + INCL. np.random/rand( 1000 ) ~~ 40 [us]
[ / 10k NEURONs tanh() ] in 5 LAYERs
~ 14 [ms / 30M Theta-s .dot() ]
[ / 17k NEURONs tanh() ] in 10 LAYERs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * ~ 1E6 iterations in { .minimize() | .fmin_l_bfgs_b() }
~ 4 [hrs / 1E6 iterations ] w/o backprop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With