GenAI: Do we risk missing out on the real and valuable benefits without open datasets?

GenAI-Do-we-risk-missing-out-on-the-real-and-valuable-benefits-without-open-datasets.jpg
Image: Getty/SC STUDIO (Getty Images)

The agtech industry is betting big on generative AI. But high quality, clean, and well organised data is needed to distinguish the hype from true impact, warn experts, who are urging for open data standards to be adopted to spur valuable innovation and for the common good of the industry.

We’ve been using AI in agriculture for over 20 years to better understand and analyse data. The key difference with its flashier new baby brother is its ability to understand and analyse data to generate new and original content. Take driverless tractors, for example, or plant perception technology. Both AI. But turning text into video? That’s GenAI, and it’s exciting the agricultural sector for a host of reasons: for its promise of boosting crop yields and efficiency and futureproof food security; for the chance to make precision agriculture more precise, and smart farming smarter. To alleviate labour challenges and accelerate the adoption of new technology and innovation in the field.

But this technology is in its infancy; little over two years old. Is it just a latest fashion trend that won’t stand the test of time?

Not according to experts at the recent World Agri-Tech Innovation Summit. But some factors must be addressed if GenAI is to live up to its promise.  

“This is an amazing time for us in technology, especially in agriculture,” said Jeremy Williams, head of digital farming at Bayer Crop Science, which is leveraging the power of GenAI in its quest to produce 50% more food in the decades ahead while at the same time restoring the environment.

“We’ve looked at it for predicting how to accelerate breeding and efficiency and to understand how to predict residues in crops,” he said during a live panel discussion. The company has also used it to see how products are performing in the field compared to competitive products.  

“What we are really interested in is how we leverage this new technology and large language models to allow advisors, farmers and others to get insights to reduce the friction of turning data into insights,” he said. The company has used public and proprietary data to train recommendation engines that will allow an ag advisor to give high quality recommendations to customers.

Its recent project with Microsoft allows farmers to ask the system things like ‘tell me the fields that have had the most amount of precipitation over time?’ and ‘what’s the difference between my expected yeild and what I actually harvested?’

The system can then manage that data and the company can provide the best products and applications for customers.

“GenAI is all about democratising access to data and being able to act on it,” said Williams.

The quicker time scales involved are crucial, explained Sidhartha Bhandari, a consultant at Publicis Sapient. More speed equals more adoption of digital solutions by farmers, he said.

“Traditionally when we’ve developed AI solutions people have focussed on large amounts of data to come together… it can take 18-24 months,” he told AgTechNavigator at the event. “But with generative AI we can start with the experience first, which is what businesses care about. Using dummy data, we can start with the end and spin out experiences faster and users get to see it now and give us feedback then we continue building. People can’t wait 24 months and then fall in love with the solution.”

The industry now has the tools to get farmers upfront and get their feedback immediately, he told us.

Biases in ag data that can affect decisions

GenAI is the fourth big technology shift in a generation, according to Ranveer Chandra, CTO of agri-food at Microsoft. “It can help computers understand you and help you better understand the data,” he explained during the live panel discussion.

However, generative AI models are only as good as the data they are trained on. Users may encounter hallucinations, where an AI model generates inaccurate, misleading, or nonsensical information and presents it as if it were true.

'We have to break free of the idea of 'if I’m gonna be first, I’m gonna win': INTENT CEO Randy Barker

Fragmented data sources and legacy systems are other problems. Potential biases in the data may also distort the outputs of any model trained on it, warned Elliot Grant, CEO at AI company Mineral. There is therefore a risk, he said, that the benefits of GenAI won’t be enjoyed by all, notably small-scale farmers in Africa or Asia, who arguably need the benefits the most.

“If we train our AI models on Midwestern corn and beans, it’s not going to discover regenerative ag or small holder cassava farming. We have to very intentionally train our data models to be balanced and have these more orphaned, less commercial crops in those data sets. Otherwise, we’ll be waiting for the benefits to be discovered for small holder farmers.”

But he added about GenAI: “It’s not a gimmick anymore. People are going to start to use AI in the real world in production and we have an obligation to the industry and to farmers and all the users that they can trust it. And we’ve got a bit of a gap to close before we get there.”

Pre-competitive collaboration will help

To close the gaps, the experts called for the agtech sector to come together to address a shared problem.

“I think it’s important the ag industry takes a step forward in open data and pre-competitive data sets including soil data, sets about certain biodiversity and genetics,” said Syngenta CIO and CDO Feroz Sheikh.

“It’s important we come together to create open standards which allow us to exchange data between systems and allow growers to maximize the benefits.”

Others at the World-Agri Summit agreed. Data is “a bit of a bugaboo”, said INTENT CEO Randy Barker. GenAI initiatives, he explained, will quickly require new types of data pipelines that prepare unstructured text data for querying by large language models (LLMs). Standardisation is needed too to establish common technical standards and best practices.  

IMG_5166.jpg
"This is an amazing time in technology, especially in agriculture,” said Jeremy Williams, head of digital farming at Bayer Crop Science (Chris Constantine)

“AI is learning and growing independently of our actions and we’re going to have to really hustle to understand how good or bad the data is.”

He therefore agreed there’s a need for open datasets to spur innovation. Business models will need to be reimaged, he said, suggesting, “we have to break free of the idea of 'if I’m gonna be first, I’m gonna win'.”

Yes, collaboration is going to be key. But gaining access to field data in a standardised way is difficult and costly.

Representative bodies can and are playing a role. Ron Baruchi, president and CEO of Agmatix, for example, revealed the company has teamed up with the International Fertilizers Association to collect more than 5,000 data sets that are fully open for “everyone to download, share and develop their own AI models around”.

Data silos must be broken down to promote innovation

It is crucial that the industry take on the problem of data silos – or isolated stores of data that cannot be not easily accessed or shared across an organization – added Bhandari.

But doing so presents both a challenge and a golden opportunity to reimagine current business models, he said. That’s because data siloing is a barrier blocking organisations from meeting digital transformation targets. But GenAI itself can help organisations optimise their data infrastructure.

For generative AI to bring real impact and foster partnerships that foster value faster, companies must therefore “transcend the boundaries of their enterprise and look at collaboration,” he told us. Data silos pose a challenge to that. But GenAI is going to change the definition of competition going forward, he predicts. Once, companies specialised in, say, seeds or fertiliser. Now they will have to move from specialisation to create "partnerships building unified data sets.”