-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sger and dger. #18
Conversation
when you have the chance, if you rebase this ontop of head I can happily merge it in (along with some basic testing) |
bump :) |
Rebased and added some simple tests. |
x <- Matrix.generateMutableDenseVector 2 (\_ -> 2.0) | ||
y <- Matrix.generateMutableDenseVector 2 (\_ -> 3.0) | ||
BLAS.sger 2.0 x y res | ||
resList <- Matrix.mutableVectorToList $ _bufferDenMutMat res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be good to add a wee comment explaining
A := alpha_x_y' + A, with the number filled in. So that why this test is correct is more immediately obvious.
Which does raise the realted point that we should probably put those formulae in the haddcoks somewhhere, but probably not in the tests or this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eg a= 2* (2*3) +1= 13
merged, thanks @archblob ! :) |
gerAbstraction gerName gerSafeFFI gerUnsafeFFI = ger | ||
where | ||
shouldCallFast :: Int -> Int -> Bool | ||
shouldCallFast m n = flopsThreshold >= (fromIntegral m :: Int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@archblob derp, should this have been a <=
instead? ie we should be calling the fast code IF m*n
is LESS than flopsThreshold
? shows what I get for merging a PR before doing a close reading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't really looked at the code since the pr was done almost a year ago :-P. I just rebased and added some tests. Will take a look when I get home.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should be calling the fast code IF mn is LESS(or LEQ) than flopsThreshold. But I think it is the same that we should be calling the fast code IF flopsThreshold is GREATER(or GEQ) than mn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct. It is correct as written. Fast is true when the product is
less than the threshold. Probably I should write a little helper
function to make it less confusing
On Thursday, May 7, 2015, Jueji Yang [email protected] wrote:
In src/Numerical/HBLAS/BLAS/Internal.hs
#18 (comment):@@ -182,6 +186,35 @@ gemvAbstraction gemvName gemvSafeFFI gemvUnsafeFFI constHandler = gemv
(fromIntegral bstride) betaPtr cp (fromIntegral cstride)+{-# NOINLINE gerAbstraction #-}
+gerAbstraction :: (SM.Storable el, PrimMonad m)
=> String
-> GerFunFFI el
-> GerFunFFI el
+gerAbstraction gerName gerSafeFFI gerUnsafeFFI = ger-> forall orient . GerFun el orient (PrimState m) m
- where
shouldCallFast :: Int -> Int -> Bool
shouldCallFast m n = flopsThreshold >= (fromIntegral m :: Int64)
Yes, I think we should be calling the fast code IF m_n is LESS_(or LEQ)
than flopsThreshold. But I think it is the same that we should be calling
the fast code IF flopsThreshold is GREATER(or GEQ) than mn?—
Reply to this email directly or view it on GitHub
https://github.com/wellposed/hblas/pull/18/files#r29869978.
@archblob no worries, i'll actually audit it properly + all the other FFI bindings when i have a chance in the next few days |
This is here without the tests so we can discuss it.
Regarding #14.