Retention networks have been making waves in the large language model scene, with big claims about the potential to replace transformers with training parallelism, cheaper inference and good perf...
Large language models are all the craze right now. I was keen to learn about keras-nlp - keras’ natural language processing framework - and recent methods, so I decided to implement RWKV, a popul...
Having looked at google-research’s fast attention tensorflow implementation and corresponding blog post, I was left scratching my head about the causal attention implementation. This post discuss...
TL;DR Omicron cases are projected to double every 2-3 days until a significant proportion of the population has become infected. This will result in a wave like nothing we have seen in Australi...
The Doherty institute recently released a report that led to an agreement between state and federal leaders about a roadmap to transition out of lockdown-management of covid19. Models looked at a v...
Implicit layers and Deep Equilibrium models (DEQ) have recently been proposed as memory-efficient alternatives to super-deep networks. In this post we explore: the mathematical background be...
TL;DR: TF2 Benchmarks don’t have to be hard to write. See example at the bottom and/or tfbm. “Premature optimization is the root of all evil.” – Donald Knuth This quote is ubiquitous in softw...
The eigenvector problem is ubiquitous in many areas of mathematics, physics and computer science. I recently found myself needing the solution to the generalized eigenvalue problem and discovere...
Data augmentation is commonly used to artificially inflate the size of training datasets and teach networks invariances to various transformations. For example, image classification networks often ...
Reproducibility is critical to any scientific endeavour, and machine learning is no exception. Releasing code that generates results from papers is an important step in addressing this, but difficu...