Bulk Import
Batch Operations, Write Buffers, Sequences, Array Operations
What Morphium Offers
storeList(), cursor-based iteration and the @WriteBuffer handle
high-volume data ingestion efficiently. Morphium batches writes automatically, manages memory
through configurable buffer sizes, and provides sequences for generating unique IDs across
distributed instances.
The Challenge
Bulk operations in MongoDB require careful batching to avoid memory exhaustion and connection timeouts. Without write buffering, each insert is a separate round-trip. Without cursor-based iteration, reading large collections loads everything into memory.
Morphium Features Used
Prerequisites & Key Concepts
@WriteBufferbuffers writes client-side. In this showcase, the entity usessize=500andstrategy=WRITE_NEW: when the buffer is full and a new write arrives, the newest write is sent immediately while older buffered writes wait. Writes are also flushed aftertimeoutmilliseconds regardless of buffer fill level.@AutoSequenceuses a separate MongoDB collection to store atomic counters. Each sequence is identified by name (e.g."import_number"). ForstoreList(), Morphium callsgetNextBatch(count)— a single atomic lock+increment+unlock regardless of how many records are in the list.push/pullmodify array fields directly on the MongoDB server using the$pushand$pulloperators. The document is never loaded into Java memory, making these operations efficient and concurrency-safe.set/unsetare server-side atomic operations.setchanges a field's value;unsetremoves the field entirely from the document (not the same as setting it to null).
Entity Source Code
ImportRecord.java
Java
import de.caluga.morphium.annotations.AutoSequence; import de.caluga.morphium.annotations.CreationTime; import de.caluga.morphium.annotations.Entity; import de.caluga.morphium.annotations.Id; import de.caluga.morphium.annotations.caching.WriteBuffer; import de.caluga.morphium.driver.MorphiumId; import lombok.Data; import lombok.Builder; import lombok.NoArgsConstructor; import lombok.AllArgsConstructor; import lombok.experimental.FieldNameConstants; import java.time.LocalDateTime; import java.util.List; @Entity(collectionName = "import_records") 1 @WriteBuffer(size = 500, strategy = WriteBuffer.STRATEGY.WRITE_NEW, timeout = 5000) 2 @Data @NoArgsConstructor @AllArgsConstructor @Builder @FieldNameConstants public class ImportRecord { @Id private MorphiumId id; @AutoSequence(name = "import_number") 3 private Long importNumber; private String source; private String data; private String status; @CreationTime 4 private LocalDateTime importedAt; private List<String> tags; 5 }
1 Maps to the
import_records collection2 Buffers up to 500 writes in memory, flushing as bulk operations.
WRITE_NEW sends new writes immediately when buffer is full. timeout = 5000 ms ensures flush even at low throughput.3 Morphium auto-assigns the next value from the
import_number sequence. Uses Long (boxed) so null signals "not yet assigned". For bulk inserts, getNextBatch(n) reserves N values in a single atomic operation.4 Automatically timestamped on first
store() — never overwritten on updates5 Modified via
morphium.push() / morphium.pull() for atomic array operations without loading the full documentWriteBuffer + Sequence Code
ImportRecord.java (annotations)
Java
import de.caluga.morphium.annotations.AutoSequence; import de.caluga.morphium.annotations.Entity; import de.caluga.morphium.annotations.caching.WriteBuffer; @Entity(collectionName = "import_records")1 @WriteBuffer(size = 500, strategy = WriteBuffer.STRATEGY.WRITE_NEW, timeout = 5000)2 public class ImportRecord { @Id private MorphiumId id; @AutoSequence(name = "import_number")3 private Long importNumber; // Long (not long) so null = "not yet assigned"4 private String source; private String data; private String status; private List<String> tags; @CreationTime5 private LocalDateTime importedAt; }
1
@Entity maps this class to a MongoDB collection with an explicit collection name.2
@WriteBuffer batches individual store() calls; size=500 caps the buffer, WRITE_NEW flushes the newest entry when the buffer is full.3
@AutoSequence(name=...) assigns the next value from a named MongoDB-backed counter when the field is null at store time.4 Using
Long (boxed) instead of long is required so the field can be null, which signals that no sequence number has been assigned yet.5
@CreationTime is automatically populated by Morphium on the first store() call.
Bulk Import with storeList()
Java
// Build a list of records — importNumber left null for @AutoSequence List<ImportRecord> records = new ArrayList<>(); for (int i = 0; i < count; i++) { records.add(ImportRecord.builder() .source(sources[i % sources.length]) .data("Record #" + (i + 1)) .status("PENDING") .tags(List.of("bulk", "auto-generated")) .build()); } // storeList() — single bulk write + single getNextBatch() for sequences morphium.storeList(records);1
1
storeList() sends all records in a single bulk write operation and calls getNextBatch(n) once to assign sequence numbers atomically — regardless of batch size.Array & Field Operations
Morphium API — push / pull / set / unset
Java
Query<ImportRecord> query = morphium.createQueryFor(ImportRecord.class) .f(ImportRecord.Fields.id).eq(new MorphiumId(id)); // push: append a tag to the tags array ($push) morphium.push(query, ImportRecord.Fields.tags, "priority");1 // pull: remove a tag from the tags array ($pull) morphium.pull(query, ImportRecord.Fields.tags, "auto-generated");2 // set: update status without loading the document ($set) query.set(ImportRecord.Fields.status, "PROCESSED", false, false, null);3 // unset: remove the "source" field entirely from the document ($unset) query.unset(ImportRecord.Fields.source);4
1
morphium.push() appends a value to an array field using MongoDB's $push operator — the document is never loaded into Java memory.2
morphium.pull() removes all matching values from an array field using MongoDB's $pull operator entirely on the server.3
query.set() updates a single field with $set; parameters are field, value, upsert, multiple, and async callback (AsyncOperationCallback).4
query.unset() removes the field key entirely from the BSON document, which is different from setting it to null.Related Documentation
- API Reference — storeList(), push(), pull(), set(), unset()
- Developer Guide — Batch Operations, @WriteBuffer
- Performance Guide — Bulk Writes, SequenceGenerator