saagarjha
> Vectorization is a process by which floating-point computations in scientific code are compiled into special instructions that execute elementary operations (+,-,*, etc.) or functions (exp, cos, etc.) in parallel on fixed-size vector arrays.

I guess this is a scientific computing course or something but I feel like even so it’s important to point out that most processors have many vector instructions that operate on integers, bitfields, characters, and the like. The fundamental premise of “do a thing on a bunch of data at once” isn’t limited to just floating point operations.

dragontamer
Auto-vectorization is easier to get into than other SIMD-frameworks like CUDA, OpenCL, ROCm, Intel's ISPC and whatnot. But in my experience, auto-vectorizers are just not as flexible as the proper SIMD-tools.

I'd say auto-vectorization should still be learned by modern high-performance programmers, because its very low-hanging fruit. You barely have to do anything and suddenly your for-loops are AVX512 optimized, though maybe not to the fullest extent possible.

Still, I suggest that programmers also learn how to properly make SIMD code. Maybe intrinsics are too hard in practice, but ISPC, CUDA, and other SIMD-programming environments make things far easier to learn than you might expect.

------------

ISPC in particular is Intel's SIMD programming language, much akin to CUDA except it compiles into AVX512. So for AVX512-like code execution environments, using the ISPC language/compiler is exceptionally useful.

Its harder to learn a new language than to learn a few compiler-options to enable auto-vectorization however. So in practice, auto-vectorization will continue to be used. But for tasks that specifically would benefit from SIMD-thinking, the dedicated ISPC language should be beneficial.

sundarurfriend
I wonder how much these apply to the Julia compiler: based on the docs for `@simd` and the writing here [1], only the first two criteria on this page [2] seem to be really a requirement in Julia's case.

The one about avoiding function calls is a no-go, since things like + and * are function calls too. But is it still required for all function calls to be inline-able?

Indirect indexing isn't mentioned in Julia's case, but is it one of the things that would disable auto-vectorization and require an explicit @simd annotation?

[1] https://viralinstruction.com/posts/hardware/#74a3ddb4-8af1-1... [2] https://cvw.cac.cornell.edu/vector/coding_vectorizable

dev_tty01
When I was in grad school programming on a Cray, "vectorization" was about chaining operations across an array and getting the inner-loop dependencies right, rather than parallel operations across elements of wide words. Interesting how the definitions have changed with architectural changes.
photochemsyn
This looks like a curious case, non-fixed-length vectorization code in RISC-V:

> "Perhaps the most interesting part of the open RISC-V instruction set architecture (ISA) is the vector extension (RISC-V "V"). In contrast to the average single-instruction multipe-data (SIMD) instruction set, RISC-V vector instructions are vector length agnostic (VLA). Thus, a RISC-V "V" CPU is flexible in choosing a vector register size while RISC-V "V" binary code is portable between different CPU implementations."

https://gms.tf/riscv-vector.html

nologic01
We need also something like Tensorization:Introduction
EGreg
Sorry if this is a different sensd of that same word…

I can understand almost everything in the explanations of how ChatGPT, neural networks and fine-tuning works

But one thing isn’t being explained…

How do you embed the words as vectors????

Do you just make up your own scheme?

I understand about vector databases being used for search, to retrieve snippets and stuff them into a prompt. I don’t mean that.

I mean how did people embed the words WHEN THEY USE THE EMBEDDINGS API?

I think this whole vector database and pinecone stuff will be phased out once the LLM windows get larger, like with the new Claude from Anthropic. And the ability to fine tune the model on your data. Then the LLM can make its own lower-rank model of your entire corpus of data, rather than having to do vector lookups and stuffing it into a prompt.

NKosmatos
I know I’m probably going to get downvoted for this, but I’m going to post it anyhow :-)

https://jvns.ca/blog/2023/02/08/why-does-0-1-plus-0-2-equal-...

sr.ht