Daerah  

New Study Challenges DeepSeek’s Approach to Enhancing AI’s Long-Text Reading Skills

AI models face a critical limitation known as the long-context bottleneck, which restricts their ability to process lengthy documents or extended conversations. This challenge has been a major hurdle for developers seeking to enhance AI systems’ performance in real-world applications that require understanding and retaining large volumes of information.

A group of researchers from China and Japan recently challenged a method introduced by DeepSeek, a Chinese artificial intelligence start-up, which aimed to improve AI’s ability to handle long blocks of text. This marks a rare instance where the company’s research has been publicly questioned.

The DeepSeek-OCR (optical character recognition) method was designed to compress text by using visual representations, potentially revolutionizing how AI models handle long texts. However, according to researchers from Japan’s Tohoku University and the Chinese Academy of Sciences, the method had significant flaws due to inconsistent performance.

In their study titled “Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR,” the research team found that the start-up’s approach relied heavily on language priors—the tendency of AI models to draw on patterns learned from large volumes of text—rather than the visual understanding it claimed to offer. As a result, the performance metrics reported by the company were deemed “misleading.”

AI models have long struggled with the long-context bottleneck, which limits their ability to process and retain information over extended periods. Improvements in this area could lead to a significant leap in AI system performance, and companies and research institutes worldwide have been actively seeking solutions.

The DeepSeek-OCR technique, published in October, was touted as a breakthrough that could handle large and complex documents by using visual perception as a compression medium. The company claimed that vision-context compression could achieve significant token reduction—seven to 20 times—offering a promising direction for addressing long-context challenges in AI.

However, in a series of carefully designed experiments, the new research found that DeepSeek-OCR’s visual question answering accuracy dropped to around 20 per cent when given access to additional text to sway its reasoning, compared with over 90 per cent accuracy for standard AI models. This stark contrast raised serious questions about the viability of optical-compression approaches in solving AI’s long-context limitations.

The researchers suggested that alternative strategies might be necessary to address these challenges effectively. Despite the criticism, DeepSeek did not immediately respond to a request for comment on Monday.

Some computer scientists described DeepSeek-OCR as more of a double-edged sword than a fundamental flaw, noting that there is no one-size-fits-all solution for all situations. Li Bojie, a computer science PhD from the University of Science and Technology of China, who now runs his own AI start-up in Beijing, pointed out that for barely discernible manuscripts, reliance on learned knowledge could help AI figure out the text. However, for clearly printed material, this reliance could be a disadvantage.

“You could say [the method] has both its advantages and a disadvantage,” Li said.

More Articles from SCMP

Chinese-owned Temu catches up with Amazon in global cross-border e-commerce

Chinese automotive suppliers speed up localisation to increase global market share

China’s nuclear fusion start-ups power up with record funding round

China suffers unprecedented double rocket launch failures in a single day

Tinggalkan Balasan

Alamat email Anda tidak akan dipublikasikan. Ruas yang wajib ditandai *