๐Ÿ’กThe Big Question in MongoDB Schema Design: Arrays vs. Separate Documents

Apurv upadhyay
5 min readOct 16, 2024

--

๐—›๐—ฎ๐˜ƒ๐—ฒ ๐˜†๐—ผ๐˜‚ ๐—˜๐˜ƒ๐—ฒ๐—ฟ ๐—ณ๐—ผ๐˜‚๐—ป๐—ฑ ๐˜†๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ๐—น๐—ณ ๐—ฑ๐—ฒ๐—ฏ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐—ฏ๐—ฒ๐˜๐˜„๐—ฒ๐—ฒ๐—ป ๐˜‚๐˜€๐—ถ๐—ป๐—ด ๐—ฎ๐—ฟ๐—ฟ๐—ฎ๐˜†๐˜€ ๐—ผ๐—ฟ ๐˜€๐—ฒ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐˜๐—ฒ ๐—ฑ๐—ผ๐—ฐ๐˜‚๐—บ๐—ฒ๐—ป๐˜๐˜€ ๐—ถ๐—ป ๐— ๐—ผ๐—ป๐—ด๐—ผ๐——๐—•?

Today, I came across this very scenario when working with appointment data that could exceed 50,000 records. Deciding the best approach is crucial for the performance and scalability of your application, so letโ€™s explore this together!

The Scenario

Imagine this: youโ€™re tasked with storing appointment information in MongoDB, and your dataset is rapidly growing. You have two example documents:

Document 1:

{
"_id": "09fb5c63-3ff3-47a4-8638-e06cf42e3f3d",
"orgId": "63c5cb11f5cf9a7d2c143967",
"centerId": "3005",
"appointmentDate": { "$date": "2024-10-15T04:00:00.000Z" },
"appointmentTimeInMins": 540,
"slotQueueNumber": 1,
"clientIpFromServer": "106.222.222.176:26760",
"clientIpFromBrowser": "106.222.222.176",
"browserName": "Mozilla/5.0...",
"visitCreationCompleted": false,
"createdDate": { "$date": "2024-10-15T12:35:28.275Z" },
"updatedDate": { "$date": "2024-10-15T12:35:28.275Z" }
}

Document 2:

{
"_id": "ecb8fc9b-f593-4c5e-8700-f6eac9882d3f",
"orgId": "63c5cb11f5cf9a7d2c143967",
"centerId": "3005",
"appointmentDate": { "$date": "2024-10-15T04:00:00.000Z" },
"appointmentTimeInMins": 600,
"slotQueueNumber": 1,
"clientIpFromServer": "::1",
"clientIpFromBrowser": "106.222.222.176",
"browserName": "Mozilla/5.0...",
"visitCreationCompleted": false,
"createdDate": { "$date": "2024-10-15T12:54:18.653Z" },
"updatedDate": { "$date": "2024-10-15T12:54:18.653Z" }
}

Hereโ€™s a sample code for both approaches: Array Structure and Separate Documents.

1. Array Structure Example

In this approach, we store multiple click events for a single day under the same document. We group these by centerId and appointmentDate.

Schema

{
"_id": "centerId_appointmentDate",
"centerId": "3005",
"appointmentDate": { "$date": "2024-10-15T04:00:00.000Z" },
"clickEvents": [
{
"eventId": "click-001",
"appointmentTimeInMins": 540,
"slotQueueNumber": 1,
"clientIpFromBrowser": "106.222.222.176",
"browserName": "Mozilla/5.0...",
"userAgent": "Mozilla/5.0...",
"createdDate": { "$date": "2024-10-15T12:35:28.275Z" }
},
{
"eventId": "click-002",
"appointmentTimeInMins": 600,
"slotQueueNumber": 2,
"clientIpFromBrowser": "106.222.222.177",
"browserName": "Mozilla/5.0...",
"userAgent": "Mozilla/5.0...",
"createdDate": { "$date": "2024-10-15T13:00:00.000Z" }
}
]
}

Code to Insert a New Click Event

db.clickEvents.update(
{ centerId: "3005", appointmentDate: new Date("2024-10-15T04:00:00.000Z") },
{ $push: { clickEvents: {
eventId: "click-003",
appointmentTimeInMins: 720,
slotQueueNumber: 3,
clientIpFromBrowser: "106.222.222.178",
browserName: "Mozilla/5.0...",
userAgent: "Mozilla/5.0...",
createdDate: new Date()
} }
},
{ upsert: true }
);

โ€ข Pros: This structure allows for quick grouping and retrieval of all click events for a single day and center.

โ€ข Cons: If the number of events exceeds MongoDBโ€™s document size limit (16 MB), it becomes unmanageable.

2. Separate Documents Example

In this approach, each click event is stored as a separate document. We still associate them by centerId and appointmentDate, but they are independent records.

Schema

{
"_id": "click-001",
"centerId": "3005",
"appointmentDate": { "$date": "2024-10-15T04:00:00.000Z" },
"appointmentTimeInMins": 540,
"slotQueueNumber": 1,
"clientIpFromBrowser": "106.222.222.176",
"browserName": "Mozilla/5.0...",
"userAgent": "Mozilla/5.0...",
"createdDate": { "$date": "2024-10-15T12:35:28.275Z" }
}

Code to Insert a New Click Event

db.clickEvents.insertOne({
_id: "click-003",
centerId: "3005",
appointmentDate: new Date("2024-10-15T04:00:00.000Z"),
appointmentTimeInMins: 720,
slotQueueNumber: 3,
clientIpFromBrowser: "106.222.222.178",
browserName: "Mozilla/5.0...",
userAgent: "Mozilla/5.0...",
createdDate: new Date()
});

โ€ข Pros: This structure scales well as each document is independent, supports efficient indexing, and avoids hitting the document size limit.

โ€ข Cons: It may result in more documents, but MongoDB handles large collections efficiently with indexing and sharding.

The Decision Point: Arrays vs. Separate Documents

At first, you might think, โ€œWhy not group these appointments in an array under centerId and appointmentDate as keys?โ€ However, when the data volume can go beyond 50K entries, several factors come into play:

1. Document Size Limit ๐Ÿšง

โ€ข MongoDB has a 16 MB document size limit. If you store all appointments in an array under a single document, you could quickly hit this limit. For large datasets like 50K+ records, storing everything in a single document isnโ€™t scalable .

2. Query Performance โšก

โ€ข Large arrays can significantly slow down queries. MongoDB will need to scan through the entire array, which isnโ€™t efficient as the array grows. On the other hand, when using separate documents, MongoDB can efficiently index fields like centerId and appointmentDate, resulting in faster and more efficient queries .

3. Ease of Updates and Modifications ๐Ÿ”„

โ€ข Modifying or deleting an individual element within an array can be cumbersome and inefficient in MongoDB. When each appointment is stored as a separate document, updates and deletions become much simpler and more efficient. You can precisely target the document you want to update without affecting others .

The Recommended Approach: Separate Documents

After considering these factors, the best approach is to store each appointment as a separate document. Hereโ€™s why:

1. Scalability and Sharding ๐Ÿš€

โ€ข By storing each appointment as its own document, you can leverage MongoDBโ€™s sharding capabilities. This is essential when your dataset grows beyond the capacity of a single server. MongoDB distributes these documents across multiple nodes, ensuring the system remains performant .

2. Indexing and Query Optimization ๐Ÿ“ˆ

โ€ข With separate documents, you can create indexes on fields like centerId and appointmentDate, which makes your queries incredibly fast, even as the number of appointments grows into the millions.

3. Maintaining Data Integrity and Flexibility ๐Ÿ› ๏ธ

โ€ข Managing data as separate documents helps maintain data integrity. Each document is self-contained, meaning updates, modifications, and validations are straightforward. Thereโ€™s no risk of hitting MongoDBโ€™s document size limit, ensuring your system remains flexible.

How It Looks: The Ideal Schema

Hereโ€™s what a schema for storing separate documents might look like:

{
"_id": "unique-appointment-id",
"centerId": "3005",
"appointmentDate": { "$date": "2024-10-15T04:00:00.000Z" },
"appointmentTimeInMins": 540,
"slotQueueNumber": 1,
"clientIpFromServer": "106.222.222.176:26760",
"clientIpFromBrowser": "106.222.222.176",
"browserName": "Mozilla/5.0...",
"visitCreationCompleted": false,
"createdDate": { "$date": "2024-10-15T12:35:28.275Z" },
"updatedDate": { "$date": "2024-10-15T12:35:28.275Z" }
}

Each appointment has its own document, making queries and updates simple, efficient, and fast.

Conclusion: Designing for Scalability

Choosing between arrays and separate documents isnโ€™t just about what feels right; itโ€™s about anticipating future growth and designing for scalability. If youโ€™re working with data thatโ€™s expected to grow large, separate documents offer the best solution for flexibility, performance, and efficiency.

Next time you face this decision in MongoDB, consider the size and growth of your data, and remember: design for the future, not just the present.

Have you faced similar scenarios? How did you decide on your approach? Letโ€™s discuss in the comments below!

โค๏ธ Share Your Thoughts!

Feel free to repost โ™ป๏ธ if you found this helpful. For more great content on microservices, follow ๐Ÿ›  Apurv Upadhyay. Until next time, happy coding! ๐Ÿš€

#MongoDB #DatabaseDesign #Scalability #DataModeling #NoSQL #DeveloperTips #PerformanceOptimization #Tech

--

--

Apurv upadhyay
Apurv upadhyay

Written by Apurv upadhyay

Principal Software Engineer at PeerIslands โ€ข Microsoft Azure Certified Architect Expert & DevOps Specialist โ€ข 7x Azure Certified โ€ข ex-Microsoft, Bosch

Responses (1)