Google DeepMind Just Broke Its Own AI With One Sentence
Google DeepMind discovered that teaching a large language model just one new sentence can cause it to behave strangely, like calling human skin “vermilion” or bananas “scarlet.” Their research, using a dataset called Outlandish, showed how rare words with low probability can trigger this spillover effect, known as priming, even after just a few training