Tom Smoker On Getting Started With GraphRAG Generative AI In The Real World podcast

Tom Smoker on Getting Started with GraphRAG

2M ago 35:24

Content provided by O'Reilly. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by O'Reilly or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Join Ben Lorica and Tom Smoker for a discussion of GraphRAG, one of the hottest topics of the last few months. GraphRAG goes a step beyond RAG to make the output of language models more consistent, accurate, and explainable. But what is a graph? A graph is a way of structuring data. In the end, it’s the structure that’s important, along with the work you do to create that structure.

Points of Interest

0:15: GraphRAG is RAG with a knowledge graph. Do you have a more strict definition?
1:00: A lot of what I do is the R in RAG: retrieve. Retrieval is better if you have structured data. I’ve yet to find a definition for GraphRAG. You want to bring in structured data.
2:03: At the end of the day, the lesson is structure. Sometimes structure is a SQL database. Don’t lose hope if you don’t have a knowledge graph.
2:49: A knowledge graph is a knowledge base and a list of axioms (rules). The knowledge base is just a word connected to another word through a third word. Fundamentally, the benefit comes from the list of triples. The value is in having extracted and defined those triples.
4:01: Knowledge graphs are cool again. What are your two favorite examples of GraphRag in production?
4:57: My examples are people who are structuring their data so that it’s consistent. Then you can bring it into a context window and do something with it.
5:18: LinkedIn and Pinterest are the best examples of existing graph structures that work.
5:35: A new application is a veterinary radiology example. Without GraphRAG, the LLM kept recommending conditions specific to Labradors not bulldogs. GraphRAG controlled the problem.
6:37: The underlying data was almost exclusively text. It’s difficult to build up a consistent dataset for veterinary radiology because animals move.
7:12: My favorite examples: Google uses their data commons to build a Q&A application. Metaphor Data: The starting point is structured data, then they create a second graph from the first graph that maps technical terms to business terms. Then they construct a social graph based on who is using the data.
9:41: Structured data can be the basis for a graph.
10:06: Unstructured data is valuable, but you need a way to navigate and categorize unstructured data.
11:04: Where are we on GraphRAG? Do you still have to explain what GraphRAG is?
11:28: More people know about it, but I have to explain it more than I did previously. Exactly what are we referring to? Most people want accuracy in the beginning; the value is often that it is more explainable. People may have seen a fantastic example, but what they haven’t seen is the iterative process in schema design. The upfront cost of these systems is nontrivial.
13:13: What are the key bottlenecks? How do I get a knowledge graph?
13:23: The biggest question is: Do you need a graph in the first place? There’s a whole spectrum. It’s in most people's interest to stop before they get to the end.
14:01: For people who come to us brand-new, we say, “You should try vector RAG first. If that doesn’t work, there’s a lot of good that structuring data can provide.”
15:01: If the chunks are structured, and a lot of the work is done up front, then it’s possible to navigate through structured information. At that point, you get value out of vector RAG. Academic papers have to follow a certain structure. If you spend time making sure you know what the chunks are, where they’re split and why, and they’re labeled, you can get a lot of value.
16:43: What are some of your pointers about how to get started?
16:47: The knowledge base is often a compressed representation. That means less tokens. That means better rate limits and less cost. So some people want a graph to help scale. That’s one start. Another is the desire for a system to be explainable. Getting that information into a structured representation and tracing back that structured representation can be very useful.

33 episodes

Podcasts Worth a Listen

Generative AI in the Real World « »
Tom Smoker on Getting Started with GraphRAG