> For the complete documentation index, see [llms.txt](https://docs.tallygo.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.tallygo.ai/context-graph-overview/extract-turning-unstructured-data-into-structured-entities.md).

# Extract: Turning Unstructured Data into Structured Entities

Raw emails and documents are unstructured, free-form text, with inconsistent formats, and implicit relationships. The extraction layer transforms this content into typed, structured entities that the graph can reason on.

<figure><img src="/files/II5LuYZ0AxF5yrC93mIP" alt=""><figcaption></figcaption></figure>

Our proprietary AI extraction engine classifies content for relevance, parses text from attachments and performs structured extraction into validated entity schemas. Every extracted entity is schema-validated before entering the graph, ensuring no untyped or malformed data gets through.

#### What Gets Extracted

Extraction is source-agnostic. Whether an HBL number appears in a subject line, an email body, or a scanned PDF, it enters the graph as the same entity type. The same identifier appearing across multiple sources strengthens the graph rather than creating duplicates.

Every piece of content can produce a rich set of logistics entities:

Identifiers: The reference numbers that tie logistics operations together. These are the most critical extraction, as they're how shipments are tracked across systems. For example, below are of the identifiers we look at

| Type                  | Example       | Purpose                       |
| --------------------- | ------------- | ----------------------------- |
| House Bill of Lading  | TPEB1234567   | Primary shipment reference    |
| Booking Number        | BKG-2024-0892 | Carrier booking reference     |
| Master Bill of Lading | MAEU123456789 | Consolidation-level reference |
| Purchase Order        | PO-44210      | Commercial reference          |
| Invoice Number        | INV-2024-3391 | Financial reference           |
| Container Number      | MRKU4512870   | Physical unit tracking        |

Identifiers also carry relationships to each other — an invoice may reference a booking, a master bill may contain multiple house bills:

<figure><img src="/files/oZXbMdrCvTjT46uzL01O" alt=""><figcaption></figcaption></figure>

**Parties & Contacts:** Organizations (shipper, consignee, carrier, forwarder, customs broker) and the people within them, with their specific roles on each shipment.

**Route & Cargo:** Multi-modal route legs (ocean, truck, rail, air) with origin/destination locations, vessel/flight info, and ETD/ETA timestamps. Containers with type and seal info. Commodities with HS codes, weight, and dimensions.

**Financial:** Charges with amount, currency, and billed-by/billed-to parties. Rate cards and quotations linking pricing to service providers.

**Events:** Milestone events (booking confirmed, cargo received, vessel departed, customs cleared, delivered) with timestamps and status tracking (estimated, planned, actual).

<br>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tallygo.ai/context-graph-overview/extract-turning-unstructured-data-into-structured-entities.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
