Prerequisite: None
Enterprise adoption of new AI paradigms is accelerating the long-running enterprise trend toward diversification of data formats, types, and paradigms. In this session, TDWI’s James Kobielus will discuss the impact of this trend on enterprise data quality.
Data’s ongoing diversification is impacting organizations’ business and technology strategies. Multimodal and unstructured data formats continue to expand their footprints in data lakehouses and other computing platforms. More enterprises are synthesizing data for diverse machine learning operationalization use cases. Organizations are storing vectorized data embeddings in specialized platforms to support analytics, queries, and other workloads for large language models and other generative foundation models.
Enterprise data engineers and data governance managers are finding themselves challenged to ensure the quality, compliance, and fitness of these new data types. Fortunately, new AI approaches—especially generative ones—are driving the adoption of new tools and techniques for managing the quality of all types of enterprise data. In this keynote presentation, TDWI senior research director James Kobielus will dissect some of the emerging approaches that rely on generative technology to manage the quality of diversifying data types.
- Generative AI is being embedded natively into existing enterprise apps to generate trustworthy, compliant data at the source
- Domain-specific reference foundation models are being built and trained to generate quality data for horizontal applications (e.g., auto-programming) and industry-specific use cases (e.g., energy and utilities, medical, pharmaceutical)
- Retrieval-augmented generation and fine-tuning tools are orchestrating generative-quality workflows, mitigating algorithmic hallucinations, and grounding outputs in enterprise-sanctioned knowledge sources
- Prompt-engineering libraries are being used to test and benchmark bias, toxicity, hallucinations, and other quality testing on generative foundation models
- Data chaining, aggregation, and vectorization tools are being deployed to ensure that the quality of generative outputs is being grounded in a wider range of models, in larger context windows, and in more fine-grained dimensions