Contents

How OpenSearch Parses Documents

Mapping, Parsing, and copy_to

I wanted to understand how OpenSearch’s document parsing and copy_to actually work under the hood. This post notes my learnings.

1. What Happens When OpenSearch Indexes a Document

When OpenSearch receives this request:

POST /my-index/_doc/1
{"name": "hello", "source_vector": [1.0, 2.0, 3.0, 4.0]}

The request travels through several layers:

  1. HTTP layer receives the request and creates an IndexRequest
  2. Transport layer routes it to the correct shard
  3. DocumentMapper.parse() is called — this is the entry point into parsing
  4. DocumentParser.parseDocument() does the actual work

The DocumentMapper is created from the index mapping definition (the PUT /my-index with mappings). It holds the tree of field mappers, one per mapped field. The DocumentParser is its engine for turning raw JSON into Lucene fields.

Here’s the entry point:

// DocumentParser.java, line 81
ParsedDocument parseDocument(SourceToParse source, MetadataFieldMapper[] metadataFieldsMappers) {
    // Step 1: Create a streaming JSON parser from the raw bytes
    XContentParser parser = XContentHelper.createParser(
        docMapperParser.getXContentRegistry(),
        LoggingDeprecationHandler.INSTANCE,
        source.source(),      // the raw JSON bytes
        mediaType             // JSON, CBOR, SMILE, or YAML
    );

    // Step 2: Create the parse context — the shared state for the entire parse
    context = new ParseContext.InternalParseContext(
        indexSettings, docMapperParser, docMapper, source, parser
    );

    // Step 3: Parse the document
    validateStart(parser);                                        // expect START_OBJECT
    internalParseDocument(mapping, metadataFieldsMappers, context, parser);
    validateEnd(parser);                                          // expect END_OBJECT

    return parsedDocument(source, context, ...);
}

The two things to pay attention to here are the parser and the context.


2. The Streaming Parser

The XContentParser is a streaming JSON parser. Unlike a DOM parser that loads the entire document into memory as a tree, a streaming parser reads one token at a time, left to right, with no way to go backwards.

The token types are defined in an enum at line 67. Each constant overrides an abstract isValue() method to classify itself:

enum Token {
    // Structural tokens — isValue() returns false
    START_OBJECT    { @Override public boolean isValue() { return false; } },  // {
    END_OBJECT      { @Override public boolean isValue() { return false; } },  // }
    START_ARRAY     { @Override public boolean isValue() { return false; } },  // [
    END_ARRAY       { @Override public boolean isValue() { return false; } },  // ]
    FIELD_NAME      { @Override public boolean isValue() { return false; } },  // "fieldName"

    // Data tokens — isValue() returns true
    VALUE_STRING    { @Override public boolean isValue() { return true; } },   // "hello"
    VALUE_NUMBER    { @Override public boolean isValue() { return true; } },   // 42 or 3.14
    VALUE_BOOLEAN   { @Override public boolean isValue() { return true; } },   // true or false
    VALUE_EMBEDDED_OBJECT { @Override public boolean isValue() { return true; } },  // binary

    // Null — treated as absence, NOT a value
    VALUE_NULL      { @Override public boolean isValue() { return false; } };  // null

    public abstract boolean isValue();
}

isValue() separates structural tokens from data tokens. Note that VALUE_NULL returns false — null is treated as absence of a value, not a value itself. This distinction matters in innerParseObject’s main loop, where the default branch checks token.isValue() to decide whether to call parseValue().

XContentParser is an interface. The actual implementation used during document parsing is AbstractXContentParser, with format-specific subclasses like JsonXContentParser, CborXContentParser, and SmileXContentParser. The DocumentParser and all field mappers only interact through the XContentParser interface — they never see the concrete class. This matters because OpenSearch supports multiple serialization formats (JSON, CBOR, SMILE, YAML), and the media type determines which concrete parser implementation is created.

The interface defines three key methods:

MethodWhat it does
nextToken()Advances the cursor to the next token and returns it
currentToken()Returns the token the cursor is currently pointing at (does NOT advance)
currentName()Returns the field name when cursor is at FIELD_NAME

Here’s a document as a token stream:

Document: {"name": "hello"}

Position:  0              1              2                    3
Token:     START_OBJECT   FIELD_NAME     VALUE_STRING         END_OBJECT
Value:     {              "name"         "hello"              }

The parser starts at position 0. Each call to nextToken() moves one position right. Once the parser moves past a token, it’s gone. It cannot be read again. This forward-only constraint is key to understanding why copy_to behaves the way it does.


3. DocumentParser

DocumentParser reads tokens from the parser and dispatches them to the right handler based on what kind of token it sees and what mapper exists for the current field.

The call hierarchy looks like this:

parseDocument()                          ← entry point (line 81)
  └─ internalParseDocument()             ← sets up metadata (line 125)
       └─ parseObjectOrNested()          ← enters the root object (line 484)
            └─ innerParseObject()        ← THE MAIN LOOP (line 529)
                 ├─ parseObject()        ← for { ... }
                 ├─ parseArray()         ← for [ ... ] (line 986)
                 ├─ parseValue()         ← for strings, numbers, etc. (line 1124)
                 └─ parseNullValue()     ← for null

internalParseDocument is a thin wrapper:

private static void internalParseDocument(Mapping mapping, ..., ParseContext.InternalParseContext context, XContentParser parser) {
    final boolean emptyDoc = isEmptyDoc(mapping, parser);

    if (mapping.root.isEnabled() == false) {
        parser.skipChildren();    // mapping disabled — skip everything
    } else if (emptyDoc == false) {
        parseObjectOrNested(context, mapping.root);   // parse the root object
    }
}

parseObjectOrNested handles the outer { } of the document:

static void parseObjectOrNested(ParseContext context, ObjectMapper mapper) {
    if (mapper.isEnabled() == false) {
        context.parser().skipChildren();
        return;
    }

    XContentParser parser = context.parser();
    XContentParser.Token token = parser.currentToken();

    // Handle nested document creation
    if (mapper.isNested()) {
        context = nestedContext(context, mapper);
    }

    // Advance past START_OBJECT to the first FIELD_NAME
    if (token == XContentParser.Token.START_OBJECT) {
        token = parser.nextToken();    // → FIELD_NAME (or END_OBJECT if empty)
    }

    // Enter the main parsing loop
    innerParseObject(context, mapper, parser, currentFieldName, token);
}

4. ParseContext

While the parser reads tokens, ParseContext carries all the mutable state of the ongoing parse operation. Every component reads from and writes to it.

The real implementation is ParseContext.InternalParseContext:

public static class InternalParseContext extends ParseContext {
    private final ContentPath path;            // tracks "parent.child.field" nesting
    private final XContentParser parser;       // THE streaming parser
    private Document document;                 // current Lucene document being built
    private final Document rootDoc;            // root Lucene document
    private final List<Document> documents;    // all documents (for nested)
    private final SourceToParse sourceToParse; // raw source bytes
    private final long maxAllowedFieldDepth;   // depth limit
    // ... more state
}

Key methods:

MethodReturnsPurpose
parser()XContentParserThe streaming parser — ONE instance shared by everything
doc()DocumentThe Lucene document currently being built
path()ContentPathThe current path like “properties.source_vector”
isWithinCopyTo()booleanAre we currently processing a copy_to target?

The fact that parser() returns the same parser instance to everyone is an important design detail. When field mapper A reads tokens from the parser, field mapper B sees the parser in whatever state A left it.

The Decorator Pattern: FilterParseContext

ParseContext uses a decorator pattern to create specialized versions of itself without copying everything. FilterParseContext wraps another context and delegates all methods to it:

private static class FilterParseContext extends ParseContext {
    private final ParseContext in;

    protected FilterParseContext(ParseContext in) { this.in = in; }

    @Override public XContentParser parser()       { return in.parser(); }
    @Override public Document doc()                { return in.doc(); }
    @Override public ContentPath path()            { return in.path(); }
    @Override public boolean isWithinCopyTo()       { return in.isWithinCopyTo(); }
    // ... dozens more methods, all delegating to 'in'
}

By itself, a FilterParseContext is identical to the original. Every method passes through to in. It’s a transparent wrapper that changes nothing.

Factory methods on ParseContext then create anonymous subclasses that override exactly ONE method:

// "Same context, but isWithinCopyTo() returns true"
public final ParseContext createCopyToContext() {       // line 669
    return new FilterParseContext(this) {
        @Override public boolean isWithinCopyTo() { return true; }
    };
}

// "Same context, but doc() returns a different document"
public final ParseContext switchDoc(final Document document) {   // line 706
    return new FilterParseContext(this) {
        @Override public Document doc() { return document; }
    };
}

// "Same context, but path() returns a different path"
public final ParseContext overridePath(final ContentPath path) { // line 718
    return new FilterParseContext(this) {
        @Override public ContentPath path() { return path; }
    };
}

This pattern appears 6 times in the file. It’s basically a transparent overlay: one value gets changed on the overlay, and the original shows through for everything else. The original context is never modified, which matters because parsing is recursive and mutating shared state in a recursive call chain is a recipe for bugs.


5. The Main Parsing Loop: innerParseObject

innerParseObject (line 529) is where the real work happens. It processes one JSON object, consuming tokens in a while loop until it hits END_OBJECT:

private static void innerParseObject(
    ParseContext context,
    ObjectMapper mapper,        // the object mapper for this level
    XContentParser parser,
    String currentFieldName,
    XContentParser.Token token
) throws IOException {
    try {
        context.incrementFieldCurrentDepth();
        context.checkFieldDepthLimit();

        while (token != XContentParser.Token.END_OBJECT) {

            if (token == XContentParser.Token.FIELD_NAME) {
                // We hit a field name like "name"
                currentFieldName = parser.currentName();
                paths = splitAndValidatePath(currentFieldName);

            } else {
                // We hit the VALUE of that field — dispatch based on token type
                switch (token) {
                    case START_OBJECT:
                        parseObject(context, mapper, currentFieldName, paths);
                        break;
                    case START_ARRAY:
                        parseArray(context, mapper, currentFieldName, paths);
                        break;
                    case VALUE_NULL:
                        parseNullValue(context, mapper, currentFieldName, paths);
                        break;
                    default:
                        if (token.isValue()) {
                            parseValue(context, mapper, currentFieldName, token, paths);
                        }
                }
            }

            token = parser.nextToken();   // ← CRITICAL: advance to next token
        }
    } finally {
        context.decrementFieldCurrentDepth();
    }
}

Tracing through {"name": "hello"}:

Entering innerParseObject. parser.currentToken() = FIELD_NAME("name")

═══ Iteration 1 ═══
  token = FIELD_NAME
  → currentFieldName = "name"
  → token = parser.nextToken() = VALUE_STRING        // line 579

═══ Iteration 2 ═══
  token = VALUE_STRING
  → default branch: token.isValue() is true
  → calls parseValue(context, mapper, "name", ...)

  ... parseValue returns ...

  → token = parser.nextToken() = END_OBJECT          // line 579

═══ Iteration 3 ═══
  token = END_OBJECT
  → while condition fails → loop exits

The contract: after parseValue (or parseArray, or parseObject) returns, the parser must be positioned at the last token of the field’s value. Then line 579 (token = parser.nextToken()) advances past it to the next field or END_OBJECT.

So when parseArray returns, the parser must be at END_ARRAY. For objects: at END_OBJECT. For simple values: still at the value token (since it’s a single token).


6. How Different Token Types Are Dispatched

Arrays: parseArray

When innerParseObject encounters START_ARRAY, it calls parseArray:

private static void parseArray(ParseContext context, ObjectMapper parentMapper,
    String lastFieldName, String[] paths) throws IOException {

    Mapper mapper = getMapper(context, parentMapper, lastFieldName, paths);

    if (mapper != null && parsesArrayValue(mapper)) {
        // This mapper handles the entire array natively
        parseObjectOrField(context, mapper);    // pass parser at START_ARRAY
    } else {
        // Normal array: iterate through items one by one
        parseNonDynamicArray(context, parentMapper, lastFieldName, arrayFieldName);
    }
}

The parsesArrayValue(mapper) check asks:

private static boolean parsesArrayValue(Mapper mapper) {
    return mapper instanceof FieldMapper fieldMapper && fieldMapper.parsesArrayValue();
}

This calls a virtual method on FieldMapper:

public boolean parsesArrayValue() {
    return false;   // default: field does NOT handle arrays natively
}

Most field types return false — text, keyword, integer, long, etc. These fields expect individual values. When indexing "tags": ["a", "b", "c"], OpenSearch iterates through the array and calls the text mapper once for each element.

But some field types return true:

  • knn_vector (KNNVectorFieldMapper.java, line 909): A vector IS an array. [1.0, 2.0, 3.0, 4.0] is one vector, not four separate values.
  • geo_point: A geo_point can be [-74.0, 40.7] — the array IS the value.

This determines who iterates through the array:

  • parsesArrayValue() = false (integer, text): DocumentParser iterates via parseNonDynamicArray. The mapper is called once per element and sees a single token each time.
  • parsesArrayValue() = true (knn_vector, geo_point): The mapper is called ONCE and receives the entire array. It reads from START_ARRAY through END_ARRAY internally.

When parsesArrayValue() returns false, parseNonDynamicArray handles the element-by-element iteration:

private static void parseNonDynamicArray(ParseContext context, ObjectMapper mapper,
    final String lastFieldName, String arrayFieldName) throws IOException {

    XContentParser.Token token;
    while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) {
        if (token == XContentParser.Token.START_OBJECT) {
            parseObject(context, mapper, lastFieldName, paths);
        } else if (token.isValue()) {
            parseValue(context, mapper, lastFieldName, token, paths);
        }
        // ... etc
    }
    // When this returns, parser is at END_ARRAY ✓
}

Simple Values: parseValue

For simple scalar tokens (VALUE_STRING, VALUE_NUMBER, etc.), parseValue is called:

private static void parseValue(ParseContext context, ObjectMapper parentMapper,
    String currentFieldName, XContentParser.Token token, String[] paths) {

    Mapper mapper = getMapper(context, parentMapper, currentFieldName, paths);
    if (mapper != null) {
        parseObjectOrField(context, mapper);     // dispatch to field mapper
    } else {
        parseDynamicValue(context, ...);         // create dynamic mapping
    }
}

Objects: parseObject

For START_OBJECT, parseObject navigates into the nested object:

private static void parseObject(ParseContext context, ObjectMapper mapper,
    String currentFieldName, String[] paths) {

    Mapper objectMapper = getMapper(context, mapper, currentFieldName, paths);
    if (objectMapper != null) {
        context.path().add(currentFieldName);
        parseObjectOrField(context, objectMapper);   // recurse
        context.path().remove();
    }
}

7. parseObjectOrField: Where Field Mappers Meet the Parser

parseObjectOrField is the dispatcher that connects the parser to the appropriate mapper:

private static void parseObjectOrField(ParseContext context, Mapper mapper) throws IOException {
    if (mapper instanceof ObjectMapper objectMapper) {
        parseObjectMapper(context, objectMapper);

    } else if (mapper instanceof FieldMapper fieldMapper) {
        fieldMapper.parse(context);                              // delegate to the field mapper
        parseCopyFields(context, fieldMapper.copyTo().copyToFields());  // then handle copy_to

    } else if (mapper instanceof FieldAliasMapper) {
        throw new IllegalArgumentException("Cannot write to a field alias");
    }
}

Two things happen for every field:

  1. fieldMapper.parse(context) - the field-type-specific logic reads tokens and creates Lucene fields
  2. parseCopyFields(context, ...) - if the field has copy_to targets, copy the value to them

8. FieldMapper

FieldMapper is the base class for all field-type-specific mappers. Its parse method at line 284:

public void parse(ParseContext context) throws IOException {
    try {
        parseCreateField(context);   // ABSTRACT — implemented by each field type
    } catch (Exception e) {
        if (!shouldIgnoreMalformed(context.indexSettings())) {
            throw new MapperParsingException("failed to parse field [" + name + "]", e);
        }
    }
    multiFields.parse(this, context);  // parse sub-fields (e.g., text.keyword)
}

Every field type implements parseCreateField() differently. Two examples worth looking at:

TextFieldMapper: Single-Token Consumer

TextFieldMapper.parseCreateField(), line 1037:

@Override
protected void parseCreateField(ParseContext context) throws IOException {
    final String value = getFieldValue(context);   // reads ONE token
    if (value == null) return;

    Field field = new Field(fieldType().name(), value, fieldType);
    context.doc().add(field);
}

// line 1059
protected String getFieldValue(ParseContext context) throws IOException {
    return context.parser().textOrNull();   // reads VALUE_STRING, does NOT advance
}

textOrNull() reads the current token’s text value. It does not call nextToken(). The parser stays exactly where it was.

Parser state: VALUE_STRING("hello") → after parse → VALUE_STRING("hello") (unchanged)

KNNVectorFieldMapper: Multi-Token Consumer

KNNVectorFieldMapper.getFloatsFromContext(), line 846:

Optional<float[]> getFloatsFromContext(ParseContext context, int dimension) throws IOException {
    context.path().add(simpleName());

    ArrayList<Float> vector = new ArrayList<>();
    XContentParser.Token token = context.parser().currentToken();

    if (token == XContentParser.Token.START_ARRAY) {
        token = context.parser().nextToken();                  // → VALUE_NUMBER(1.0)
        while (token != XContentParser.Token.END_ARRAY) {
            value = context.parser().floatValue();             // read the number
            vector.add(value);
            token = context.parser().nextToken();              // → next VALUE_NUMBER or END_ARRAY
        }
        // Loop exits when token == END_ARRAY
    }

    validateVectorDimension(dimension, vector.size());
    return Optional.of(arrayFromList);
}

This reads through the ENTIRE array: START_ARRAY, 1.0, 2.0, 3.0, 4.0, END_ARRAY. After the method returns, the parser is positioned at END_ARRAY.

Parser state: START_ARRAY → after parse → END_ARRAY (advanced 6 positions!)

GeoPointFieldMapper: Format-Dependent

GeoPointFieldMapper extends AbstractPointGeometryFieldMapper, which parses geo_point values through GeoUtils.parseGeoPoint(). The geo_point type supports multiple JSON formats:

Object format {"lat": 40.7, "lon": -74.0}:

START_OBJECT → FIELD_NAME("lat") → VALUE_NUMBER(40.7) → FIELD_NAME("lon") → VALUE_NUMBER(-74.0) → END_OBJECT

Parser consumes START_OBJECT through END_OBJECT — multi-token.

Array format [-74.0, 40.7]:

START_ARRAY → VALUE_NUMBER(-74.0) → VALUE_NUMBER(40.7) → END_ARRAY

Parser consumes START_ARRAY through END_ARRAY — multi-token.

String format "40.7,-74.0":

VALUE_STRING("40.7,-74.0")

Single token. Parser does NOT advance — single-token.


9. How copy_to Works: parseCopyFields and parseCopy

copy_to copies one field’s indexed value into another field at index time. Given this mapping:

{
  "name": {"type": "text", "copy_to": "name_copy"},
  "name_copy": {"type": "text"}
}

Indexing {"name": "hello"} will result in both name and name_copy having the value “hello” in the Lucene index. Note: copy_to does not add the target to _source. _source always stores the original document exactly as it was sent.

After fieldMapper.parse(context) returns, parseCopyFields handles the copy_to targets:

private static void parseCopyFields(ParseContext context, List<String> copyToFields) throws IOException {
    if (!context.isWithinCopyTo() && copyToFields.isEmpty() == false) {
        context = context.createCopyToContext();    // flag: we're in a copy now

        for (String field : copyToFields) {
            // Find which document the target belongs to (handles nested docs)
            ParseContext.Document targetDoc = null;
            for (ParseContext.Document doc = context.doc(); doc != null; doc = doc.getParent()) {
                if (field.startsWith(doc.getPrefix())) {
                    targetDoc = doc;
                    break;
                }
            }

            final ParseContext copyToContext =
                (targetDoc == context.doc()) ? context : context.switchDoc(targetDoc);

            parseCopy(field, copyToContext);
        }
    }
}

And parseCopy simply calls parse on the target mapper:

private static void parseCopy(String field, ParseContext context) throws IOException {
    Mapper mapper = context.docMapper().mappers().getMapper(field);
    if (mapper != null) {
        if (mapper instanceof FieldMapper fieldMapper) {
            fieldMapper.parse(context);    // parse the target field
        }
    }
}

The important thing here: parseCopy calls fieldMapper.parse(context) using the SAME context, which has the SAME parser. Whatever state the parser was in after the source field’s parse(), that’s what the target mapper sees.

The !context.isWithinCopyTo() guard at the top of parseCopyFields prevents infinite recursion. If field A copies to field B, and field B also has copy_to pointing to field C, field B’s copy_to won’t trigger from within the copy context. createCopyToContext() uses the same FilterParseContext decorator pattern from section 4 - wraps the current context and overrides isWithinCopyTo() to return true.


10. Dry Run: Tracing a Text Field with copy_to

Putting it all together with a concrete example.

Mapping:

{
  "name": {"type": "text", "copy_to": "name_copy"},
  "name_copy": {"type": "text"}
}

Document: {"name": "hello"}

Token stream:

Position: 0              1              2                    3
Token:    START_OBJECT   FIELD_NAME     VALUE_STRING         END_OBJECT
Value:    {              "name"         "hello"              }

Execution trace:

innerParseObject loop:

StepLineTokenAction
1546FIELD_NAME("name")currentFieldName = "name"
2579parser.nextToken()VALUE_STRING
3574VALUE_STRING is a valueparseValue(context, mapper, "name", ...)

Inside parseValueparseObjectOrField:

mapper is TextFieldMapper (a FieldMapper)
copyToFields = ["name_copy"]

fieldMapper.parse(context)              // parse the source field
parseCopyFields(context, copyToFields)  // copy to targets

TextFieldMapper.parse(context):

context.parser().currentToken() → VALUE_STRING("hello")
context.parser().textOrNull()   → "hello"
// Creates Lucene field, adds to document
// Parser position: UNCHANGED — still at VALUE_STRING("hello")

parseCopyFields(context, ["name_copy"]):

context = context.createCopyToContext()     // flag isWithinCopyTo = true
parseCopy("name_copy", context)
  → mapper = TextFieldMapper for "name_copy"
  → fieldMapper.parse(context)
    → context.parser().currentToken() → VALUE_STRING("hello")  ← SAME TOKEN!
    → context.parser().textOrNull()   → "hello"
    → Creates Lucene field for name_copy
    → SUCCESS ✓

Back in innerParseObject, line 579:

token = parser.nextToken() → END_OBJECT
→ While loop exits

Why this works: textOrNull() reads the current token without advancing the parser. After the source field’s parse(), the parser is still at VALUE_STRING("hello"). The copy target reads the exact same token. copy_to works here because a single-token value can be read multiple times from the same parser position.


11. Single-Token vs Multi-Token Fields: A Critical Distinction

Everything above boils down to a distinction in how field mappers consume parser tokens:

Single-token fields (text, keyword, integer, boolean, date, geo_point as string):

  • Read ONE token from the parser (e.g., VALUE_STRING, VALUE_NUMBER)
  • The parser position does NOT advance
  • copy_to naturally works — the same token can be re-read

Multi-token fields (knn_vector, geo_point as object/array):

  • Read MULTIPLE tokens from the parser (e.g., START_ARRAY through END_ARRAY)
  • The parser position advances to the END token
  • copy_to receives the parser after it has been consumed

For integer arrays like [1, 2, 3], DocumentParser iterates the array and calls the mapper once per element — each call sees a single VALUE_NUMBER. But for knn_vector [1.0, 2.0, 3.0, 4.0], the mapper handles the entire array internally because parsesArrayValue() returns true.

This distinction doesn’t matter most of the time. But when copy_to enters the picture, the shared-parser design creates a tension: the source field consumes the tokens, and the copy target needs them too. For single-token fields this tension doesn’t exist since the token is simply re-read. For multi-token fields, it becomes a problem. This is actually what got me reading through all this code in the first place.


Appendix: The Mapper Folder

The org.opensearch.index.mapper package has ~100 files. Most are field-type mappers (one per field type). These are the infrastructure files that orchestrate parsing regardless of field type:

FileRole
DocumentParser.javaThe central orchestrator. Reads tokens from the parser and dispatches them to the correct mapper. Contains the main parsing loop and copy_to logic.
ParseContext.javaThe mutable state carried through the entire parse. Holds the parser, the current Lucene document being built, the field path, and depth counters. Uses a decorator pattern (FilterParseContext) for creating specialized contexts.
DocumentMapper.javaThe entry point. Created from the index mapping definition. Owns the DocumentParser and the root ObjectMapper. Its parse() method is what the indexing engine calls.
FieldMapper.javaBase class for all concrete field types. Defines parse(context) which calls the abstract parseCreateField(context). Also defines parsesArrayValue() (default false) and copyTo().
ObjectMapper.javaRepresents a JSON object in the mapping (a container of fields). Holds child mappers and settings like dynamic and enabled.
SourceToParse.javaThe input to parsing. Contains the raw JSON bytes, the index name, the document ID, and the media type.
ParsedDocument.javaThe output of parsing. Contains the Lucene Document(s) ready to be written to the index, plus the stored _source bytes.

How they fit together:

DocumentMapper                               ← created from the PUT mapping
  ├─ ObjectMapper (root)                     ← the root { } of the document
  │    ├─ ObjectMapper children              ← nested objects
  │    └─ FieldMapper children               ← leaf fields (text, knn_vector, etc.)
  └─ DocumentParser                          ← the parsing engine
       uses ParseContext                     ← carries XContentParser + mutable state
       reads SourceToParse                   ← raw JSON bytes in
       produces ParsedDocument               ← Lucene docs out

All links point to release tags: OpenSearch 3.5.0 and k-NN 3.5.0.0.