Talk:Attention (machine learning)

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics
???	This article has not yet received a rating on the project's importance scale.

Statistics

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
???	This article has not yet received a rating on the importance scale.

Systems

	Systems science portal This article is within the scope of WikiProject Systems, which collaborates on articles related to systems and systems science.SystemsWikipedia:WikiProject SystemsTemplate:WikiProject SystemsSystems
???	This article has not yet received a rating on the project's importance scale.
	This article is within the field of Cybernetics.

Computing

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
???	This article has not yet received a rating on the project's importance scale.

Computer science

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science

???

This article has not yet received a rating on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Confusing line "X is the input matrix of word embeddings, size 4 x 300. x is the word vector for "that". "

After "4x300" it immediately says "x is the word for 'that'." That's super-confusing, because one might think that the second x refers to the x between 4 and 300. There are three different uses of x in the sentence. Someone familiar with the field will be able to understand it, but wikipedia is meant to be clear as possible. ThinkerFeeler (talk) 00:20, 30 July 2023 (UTC)[reply]

typo in "asterix"?

in the following extract: " The asterix within parenthesis "(*)" denotes the softmax" shouldn't the word be asterisk, not asterix ? :-) Jrob kiwi (talk) 16:35, 23 August 2023 (UTC)[reply]

Does RNN mean "recursive neural network" or "recurrent neural network"?

In this article, is RNN supposed to mean "recursive neural network" or "recurrent neural network", or maybe sometimes one and sometimes the other? Once we figure this out, let's replace all occurrences with the correct three words, so that it is immediately clear even to novices. — $Q$ uantling (talk | contribs) 16:14, 24 October 2023 (UTC)[reply]

I'm pretty sure it is "recurrent". I am going to go ahead and edit. If I have it wrong, please accept my apologies ... and fix my edit. —

Q

uantling (talk | contribs) 16:23, 24 October 2023 (UTC)[reply]

hard vs soft weights

The intro mentions hard and soft weights, which I havent heard before in this context. can someone provide a citation showing it is actually used terminology? DMH43 (talk) 15:15, 26 December 2023 (UTC)[reply]

'word' should be replaced with something more generic

The article frequently uses the word "word" when talking about attention. For example the opening paragraph states: "It calculates "soft" weights for each word, more precisely for its embedding, in the context window.". However, attention is a concept that is independent of input type - it can and has been applied to words, pixel values, quantities, etc. I believe it would be clearer to replace the use of "word" in reference to the inputs that attention is applied to, with something more generic such as "input element" or "token". 180.150.65.6 (talk) 14:31, 5 March 2024 (UTC)[reply]

Where the matrices coming from?

The article does not explain where the Q K V matrices are coming from or how the corresponding networks are trained. 108.53.169.6 (talk) 02:38, 4 August 2024 (UTC)[reply]

Article dispute resolution

@Ffid tham you have been repeatedly reverting all article edits to a very specific version of the article. However, at that point, the article is disorganized, and hard to read. Consider for example:

> The attention network was designed to identify high correlations patterns amongst words in a given sentence, assuming that it has learned word correlation patterns from the training data. This correlation is captured as neuronal weights learned during training with backpropagation.

This uses awkward phrasing like "neuronal weights learned". It also says "attention network", but attention mechanism is not a network. It is a module that can go into different kinds of neural networks.

> The diagram shows the Attention forward pass calculating correlations

This diagram is hard to understand, especially up there as the first image showing the mathematical operations all together. To have good style, the article should start simple and build the attention mechanism piece-by-piece. Specifically, the section on seq2seq was written to build the attention mechanism piece-by-piece.

After that section, then that picture can be displayed as a big summary (although I believe better pictures are available).

Furthermore, the "Encoder-decoder with attention" diagram is deeply confusing. I don't know what it shows, and I suspect neither would the readers. I have worked on the Transformer page a great deal, so I would know what encoder-decoder mechanism is, but this diagram has defeated me. There are better diagrams out there that I can put in, from seq2seq:

Please justify your choice of that very specific version of the article, despite all these problems I have pointed out. See WP:DISPUTE for guidelines for dispute resolution

pony in a strange land (talk) 17:36, 25 October 2024 (UTC)[reply]