Run any Falcon Model at up to 16k context without losing sanity Current Falcon inference speed on consumer GPU: up to 54+ tokens/sec for 7B and 18-25 tokens/sec for 40B 3-6 bit, roughly 38/sec and ...
The c_tests and rust_tests foldesr have a number of small C/C++/Rust programs to validate rvos. If the app will use more 20 meg of RAM for the heap, use rvos' -h or -m argument to specify how much RAM ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results