AI and MySQL Databases

One notable mistake is that many AI-generated database schemas currently favour using VARCHAR(36) for the id column.

16 May 2025

5

min read

AI Innovation

Adrian Sweeney

One notable mistake is that many AI-generated database schemas currently favour using VARCHAR(36) for the id column.

This is typically motivated by a desire to prevent unauthorised access by malicious actors. To achieve this, UUIDs (Universally Unique Identifiers) are often employed as a security measure.

Take WordPress as an example — it typically exposes post IDs in the URL like ?p=45. Compared to an SEO-friendly format such as /2025/05/sample-post, this numeric ID allows users to potentially guess and access adjacent content by altering the number (e.g., ?p=44). UUIDs, on the other hand, are not sequential and are far harder to guess.

UUIDs are 36-character strings that follow a standard format with five sections separated by hyphens: 8-4-4-4-12. Each section is made up of hexadecimal values:

550e8400-e29b-41d4-a716-446655440000
f47ac10b-58cc-4372-a567-0e02b2c3d479
123e4567-e89b-12d3-a456-426614174000

A simple regex to validate UUIDs looks like this:

^[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}$

While UUIDs offer security and portability, they come with trade-offs. They're significantly larger than integers, which can slow down database performance — especially when used as primary keys in relational databases. Indexing, memory usage, and query speed all suffer due to the size and non-sequential nature of UUIDs.

An AI Conversation on ID Strategy

Curious about this design choice, I asked GitHub Copilot the following question:

Me:
Why have you chosen the id column to be defined as VARCHAR(36) PRIMARY KEY in the database structure?

GitHub Copilot:

I chose VARCHAR(36) for the ID columns for a few specific reasons:

  1. UUID Storage: The 36-character length fits UUID values, which follow a standardized format.
  2. Multi-Tenant Systems: UUIDs help isolate tenants by eliminating sequential ID leakage, simplifying migrations, and improving security.
  3. Data Portability: UUIDs ensure that exported data doesn't suffer from ID collisions during import into another system.

Trade-offs:

  • Larger storage size (36 bytes vs. 4 bytes for INT)
  • Slower indexing and query performance
  • Increased memory usage

Alternatives to consider:

  1. AUTO_INCREMENT INT/BIGINT
  2. BINARY(16) to store raw UUID bytes more efficiently
  3. A custom ID generation strategy

Let me know if you'd prefer a different approach.

Me:
Use BIGINT UNSIGNED for performance, but keep a guid field in each record for external references.

GitHub Copilot:

I'll update the schema to use BIGINT UNSIGNED as the primary key while adding a guid column for external references. This hybrid approach combines the best of both strategies:

  • Performance: Efficient indexing and joins via numeric primary keys
  • Security & Portability: GUIDs exposed in APIs, making internal IDs safe from enumeration

Summary:

  • BIGINT UNSIGNED: Fast, compact, sequential
  • GUID (VARCHAR): Unique, portable, externally safe

NoSQL Databases and MongoDB ObjectIds

While relational databases offer flexibility in choosing between UUIDs, integers, or hybrids, NoSQL systems like MongoDB take a different path.

MongoDB uses a built-in type called ObjectId for its _id field. This is a 12-byte binary value that encodes:

  • 4 bytes – Timestamp (seconds since epoch)
  • 5 bytes – Random value (often based on machine + process ID)
  • 3 bytes – Incrementing counter

Example:

507f1f77bcf86cd799439011

Why MongoDB Chooses ObjectId

  1. Globally Unique Without Coordination
    Similar to UUIDs, ObjectIds don't require a central authority to ensure uniqueness, making them ideal for distributed environments.
  2. Time-Ordered
    The embedded timestamp means documents are roughly sorted chronologically by default, which can help with performance in time-series applications.
  3. Compact and Efficient
    Stored in binary, ObjectIds take only 12 bytes — much smaller than a VARCHAR(36) UUID. They're also more index-friendly.

Comparing ID Strategies

FeatureUUID (VARCHAR)BIGINT + GUID HybridMongoDB ObjectId
Storage Size36 bytes8 + 36 bytes12 bytes
Indexing PerformancePoorExcellentExcellent
Sequential / Time OrderNoYes (BIGINT only)Yes (via timestamp)
Readable FormatYesYes (GUID)No (hex-based)
GuessableNoNo (GUID only)No
Schema EnforcementStrictStrictFlexible

AI Preferences in NoSQL Contexts

Many AI-generated systems default to MongoDB-like NoSQL setups when:

  • High write throughput or sharding is expected
  • Schemas need to be flexible or semi-structured
  • Data portability and distributed scaling are priorities

In these cases, using ObjectIds or similar unique, time-sortable identifiers is often a sensible default.

These architecture decisions are not theoretical; they directly influence how we design and scale platforms like PrimeCRM in real-world environments.

Back to Knowledge Hub