FinanceLane
  • Funding
    • Equity Funding
    • Debt Funding
    • Crowdfunding
    • Real Estate Funding
  • Investing
    • Stocks
    • Bonds
    • Mutual Funds
    • Commodities
    • Forex
    • Private Equity
    • Real Estate
    • Crypto Investing
  • Lending
    • Personal Loan
    • Business Loan
    • Mortgage
    • Credit Card
    • Microfinance
    • Peer-to-Peer Lending
  • Insurance
    • Life Insurance
    • Health Insurance
    • Auto Insurance
    • Education Insurance
    • General Insurance
  • Banking
    • Individual Banking
    • Business Banking
    • Investment Banking
    • Neo Banking
    • Payments Bank
  • Wealth
    • Earning
    • Savings
    • Investments
    • Budgeting
    • Credit Management
    • Tax Planning
    • Retirement
  • Fintech
    • Payments
    • Digital Banks
    • Alternative Financing
    • Asset Management
    • Softwares
  • Startup
    • Startup Ecosystem
    • Merging & Acquisition
    • Equity Investing
    • Franchising
    • Business Offers
  • Crypto
    • Crypto Coins
    • Crypto Trading
    • Bitcoin
    • Blockchain
    • DAPP
    • Crypto Investing
  • Login
No Result
View All Result
FinanceLane
  • Home
  • Funding
  • Investing
  • Lending
  • Insurance
  • Banking
  • Wealth
  • Crypto
  • Newsletters
  • Feedback
Home News Feed Blockchain News

NVIDIA Unveils Blueprint for Enterprise-Scale Multimodal Document Retrieval Pipeline

Blockchainby Blockchain
August 30, 2024

Caroline Bishop Aug 30, 2024 01:27

NVIDIA introduces an enterprise-scale multimodal document retrieval pipeline using NeMo Retriever and NIM microservices, enhancing data extraction and business insights.

NVIDIA Unveils Blueprint for Enterprise-Scale Multimodal Document Retrieval Pipeline

In an exciting development, NVIDIA has unveiled a comprehensive blueprint for building an enterprise-scale multimodal document retrieval pipeline. This initiative leverages the company’s NeMo Retriever and NIM microservices, aiming to revolutionize how businesses extract and utilize vast amounts of data from complex documents, according to NVIDIA Technical Blog.

Harnessing Untapped Data

Every year, trillions of PDF files are generated, containing a wealth of information in various formats such as text, images, charts, and tables. Traditionally, extracting meaningful data from these documents has been a labor-intensive process. However, with the advent of generative AI and retrieval-augmented generation (RAG), this untapped data can now be efficiently utilized to uncover valuable business insights, thereby enhancing employee productivity and reducing operational costs.

The multimodal PDF data extraction blueprint introduced by NVIDIA combines the power of the NeMo Retriever and NIM microservices with reference code and documentation. This combination allows for accurate extraction of knowledge from massive volumes of enterprise data, enabling employees to make informed decisions swiftly.

Building the Pipeline

The process of building a multimodal retrieval pipeline on PDFs involves two key steps: ingesting documents with multimodal data and retrieving relevant context based on user queries.

Ingesting Documents

The first step involves parsing PDFs to separate different modalities such as text, images, charts, and tables. Text is parsed as structured JSON, while pages are rendered as images. The next step is to extract textual metadata from these images using various NIM microservices:

  • nv-yolox-structured-image: Detects charts, plots, and tables in PDFs.
  • DePlot: Generates descriptions of charts.
  • CACHED: Identifies various elements in graphs.
  • PaddleOCR: Transcribes text from tables and charts.

After extracting the information, it is filtered, chunked, and stored in a VectorStore. The NeMo Retriever embedding NIM microservice converts the chunks into embeddings for efficient retrieval.

Retrieving Relevant Context

When a user submits a query, the NeMo Retriever embedding NIM microservice embeds the query and retrieves the most relevant chunks using vector similarity search. The NeMo Retriever reranking NIM microservice then refines the results to ensure accuracy. Finally, the LLM NIM microservice generates a contextually relevant response.

Cost-Effective and Scalable

NVIDIA’s blueprint offers significant benefits in terms of cost and stability. The NIM microservices are designed for ease of use and scalability, allowing enterprise application developers to focus on application logic rather than infrastructure. These microservices are containerized solutions that come with industry-standard APIs and Helm charts for easy deployment.

Moreover, the full suite of NVIDIA AI Enterprise software accelerates model inference, maximizing the value enterprises derive from their models and reducing deployment costs. Performance tests have shown significant improvements in retrieval accuracy and ingestion throughput when using NIM microservices compared to open-source alternatives.

Collaborations and Partnerships

NVIDIA is partnering with several data and storage platform providers, including Box, Cloudera, Cohesity, DataStax, Dropbox, and Nexla, to enhance the capabilities of the multimodal document retrieval pipeline.

Cloudera

Cloudera’s integration of NVIDIA NIM microservices in its AI Inference service aims to combine the exabytes of private data managed in Cloudera with high-performance models for RAG use cases, offering best-in-class AI platform capabilities for enterprises.

Cohesity

Cohesity’s collaboration with NVIDIA aims to add generative AI intelligence to customers’ data backups and archives, enabling quick and accurate extraction of valuable insights from millions of documents.

Datastax

DataStax aims to leverage NVIDIA’s NeMo Retriever data extraction workflow for PDFs to enable customers to focus on innovation rather than data integration challenges.

Dropbox

Dropbox is evaluating the NeMo Retriever multimodal PDF extraction workflow to potentially bring new generative AI capabilities to help customers unlock insights across their cloud content.

Nexla

Nexla aims to integrate NVIDIA NIM in its no-code/low-code platform for Document ETL, enabling scalable multimodal ingestion across various enterprise systems.

Getting Started

Developers interested in building a RAG application can experience the multimodal PDF extraction workflow through NVIDIA’s interactive demo available in the NVIDIA API Catalog. Early access to the workflow blueprint, along with open-source code and deployment instructions, is also available.

Image source: Shutterstock Read The Original Article on Blockchain.News

Tags: AIDOCUMENT RETRIEVALENTERPRISE SOLUTIONSNewsNvidia

Related Topics

Advisory

ITR filing 2025: Five changes in ITR forms of FY 2024-25 (AY 2025-26)

Advisory

Home loan rate 8% or below: 10 banks offering lower interest rate to borrowers; check latest EMI on 30 lakh home loan

Prev Next

You May Like

Advisory

ITR filing 2025: Five changes in ITR forms of FY 2024-25 (AY 2025-26)

Advisory

Home loan rate 8% or below: 10 banks offering lower interest rate to borrowers; check latest EMI on 30 lakh home loan

Advisory

5 tips if you want to book cheap flight tickets for your next trip

Advisory

Gold loan rules: 9 proposals made by RBI in the draft guidelines and how they may impact your borrowing

Advisory

DU Admission 2025: 11 documents you should keep handy to secure admission in Delhi University

Advisory

5 cheapest international travel destinations in 2025, per airfares, as per Skyscanner

Advisory

SBI cuts fixed deposit interest rates again by 20 bps: Check latest FD rates

Advisory

Are banks open or closed today, May 17, 2025? Check the Saturday bank holiday status

Financial News

Investing

My Stock Portfolio Analysis: Dec 2023

FinanceLane
by FinanceLane
Blockchain News

Spider Tanks Introduces ‘Tracks’: A Retro-Inspired Tank Body

Blockchain
by Blockchain
Blockchain News

Meta Unveils Next-Gen AI Emu Video and Emu Edit

Blockchain
by Blockchain
Bitcoin

Do Kwon’s Extradition to South Korea Postponed by Montenegrin Supreme Court

CoinDesk
by CoinDesk
Blockchain

Astar & Startale: Illuminating the Path to Web3 Mass Adoption

Blockchain
by Blockchain
Blockchain

Chiliz (CHZ) Announces $10,000 Hackathon to Innovate Sports Fan Engagement

Blockchain
by Blockchain
Bitcoin

Coinbase to Delist Unauthorized Stablecoins in EU by December

CoinDesk
by CoinDesk
Bitcoin

Prosecutors Charge Four Crypto ‘Market Makers,’ Employees With Market Manipulation, Fraud

CoinDesk
by CoinDesk
Blockchain News

Israeli Central Bank Official Embraces CBDC Competition with Banks for Economic Growth

Blockchain
by Blockchain
Bitcoin

Ethena, a $1.3B Yield-Earning Protocol, to Debut Governance Token Next Week

CoinDesk
by CoinDesk
Blockchain

Celo Foundation: cLabs Introduces Dango, a New Layer-2 Testnet for Celo

Blockchain
by Blockchain
Bitcoin

Solana Protocol Kamino Eyes Airdrop Following Jito Token Launch

CoinDesk
by CoinDesk
Load More
FinanceLane.com
  • Disclaimer
  • Privacy Policy
  • Terms of use
  • Subscribe
  • Contact

Subscribe to get the latest updates

Follow us on

© 2022 FinanceLane.com. All rights reserved.

Welcome Back!

Sign In with Facebook
Sign In with Google
Sign In with Linked In
OR

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
  • Home
  • Funding
    • Equity Funding
    • Debt Funding
    • Real Estate Funding
    • Crowdfunding
  • Investing
    • Stocks
    • Bonds
    • Mutual Funds
    • Private Equity
    • Merging & Acquisition
    • Real Estate
  • Lending
    • Personal Loan
    • Business Loan
    • Credit Card
    • Microfinance
    • Peer-to-Peer Lending
  • Insurance
    • Life Insurance
    • Auto Insurance
    • Education Insurance
    • Health Insurance
  • Banking
    • Business Banking
    • Payments Bank
    • Investment Banking
    • Individual Banking
  • Wealth
    • Earning
    • Savings
    • Investments
    • Budgeting
    • Credit Management
    • Tax Planning
    • Retirement
  • Fintech
    • Alternative Financing
    • Payments
    • Asset Management
    • Digital Banks
    • Softwares
  • Fintech
    • Alternative Financing
    • Asset Management
    • Digital Banks
    • Softwares
    • Payments
  • Crypto
    • Crypto Investing
    • Crypto Trading
    • Crypto Coins
    • Bitcoin
    • Blockchain
    • DAPP
  • Subscribe
  • Contact
  • Login

© 2022 FinanceLane - Terms and Conditions | Disclaimer | Privacy Policy

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.