Uploading files for a web application: server, filesystem or both?
Published
Let’s say you’re working on a web application and you want to give your users the ability to upload files.
Great! Nothing wrong with that. Very common with apps 😆.
How you do that however can be a bit of a minefield, especially when you ask “where do I store the files?”.
Some will say that the files should be stored on the server’s filesystem and served from there because serving files is what it’s meant for. You use a database to store the file path and go from there.
Others might suggest keeping the files on the database itself which has the benefit of keeping things neat: no risk of losing the connection between the filesystem and the database, but it also means an added cost to the database when serving the files.
There are strong opinions for both, and many third-party services offer to solve this dilemma for you (for a price).
But what if you could get the best of both?
Serving files from the filesystem is great (because, yes, that’s what it was designed and optimized to do), and the ability to handle the file’s data without keeping it synchronized with the filesystem would make your app more reliable.
So here’s the plan:
- bite the bullet and upload the file into the database,
- when the file is requested, create the file in the filesystem and serve it from there, and
- if the file is requested again, just serve it as normal.
The example uses Node.js and Prisma.js for the server and database management, but the same approach could be used for other servers and libraries.
Let’s start the function:
export const getFileInFileSystem = async (id: bigint) => {
// ...
};
In your database you have a files
table:
model files {
id BigInt @id @default(autoincrement()) @db.UnsignedBigInt
uuid String? @unique(map: "idx_uuid") @db.Char(36)
name String @db.VarChar(255)
content Bytes @db.MediumBlob
size BigInt @db.UnsignedBigInt
users users[]
}
The id
— your primary key for the table — you want to keep that hidden from users: that’s an internal value and not for public display. Internally you can safely use that to query the file’s record from the database.
import { writeFile } from "fs/promises";
import { existsSync, mkdirSync } from "fs";
export const getFileInFileSystem = async (id: bigint) => {
const file = await prisma.files.findUnique({
where: {
id,
},
select: {
uuid: true,
mimetype: true,
}
});
if (!file) {
return;
}
};
So now you know if the file exists, but if some reason it doesn’t then you have an escape hatch with the early return
.
This query requests the absolute minimum: the uuid
and the mimetype
of the file. Even if you stored the files on the server’s filesystem you’d still need to make a query to find the path, so right now there’s no additional work that the database needs to do.
Keep in mind that this approach means you’ll have two copies of the file, one of the database and another on the filesystem. But the filesystem version is expendable. If we don’t need it anymore then it’s safe to delete, and what better way for that to be maintained that to have a directory with a date on it.
export const getFileInFileSystem = async (id: bigint) => {
// ...
// Get today's date as a string to use in the file path
const date = new Date().toISOString().split("T")[0];
// Define the directory where it *could* be stored
const directory = `./public/assets/${date}`;
Now you have a directory that can be removed once it’s more than 24 hours old.
In the DB request there was the mimetype
. That will be needed too:
export const getFileInFileSystem = async (id: bigint) => {
// ...
// Write the file to the file system using the correct path and mime type
const extension = file.mimetype.split("/")[1];
If the id
is private then the uuid
is fair game. That can be public as there’s little chance of people guessing it. You don’t want to use the name
because it’s possible for multiple people to upload duplicate files with the same name. The uuid
is unique:
export const getFileInFileSystem = async (id: bigint) => {
// ...
const path = `${directory}/${file.uuid}.${extension}`;
If the path
exists then all you need to do it return it: the file is ready to be served. But if it doesn’t exist in the temporary folder then it’s time to make it:
export const getFileInFileSystem = async (id: bigint) => {
// ...
const path = `${directory}/${file.uuid}.${extension}`;
// Check if the file exists
if (!existsSync(path)) {
// Create the directory if it doesn't exist
if (!existsSync(directory)) {
mkdirSync(directory, { recursive: true });
}
// Get the file content
const fullFile = await prisma.files.findUnique({
where: {
id,
},
select: {
id: true,
content: true,
},
});
await writeFile(path, fullFile.content, {
encoding: "binary",
});
}
// Return the path
return path;
};
Using a second DB request might be a little controversial, but this one is needed to get the file’s content
. You could have done that in the first request, but it would have been a waste if the file already existed. Getting the content
in the second request, combined with the 24 hour folder, means that the file’s content
will only be requested once every 24 hours.
Think of the number of times an avatar image might be shown in a list of comments, or photos in a gallery: it doesn’t matter how many times they’re requested, the biggest DB request only happens once a day.
If you need to replace or delete the file then you have to do a little extra work to clear it from the filesystem once it’s gone from the database. Or you could just be lazy and wait for a Cron job to delete the directory.
Now you have a way to leverage both the database and the filesystem that does the best of both: maintain data integrity in the database and quickly serve files from the filesystem.
You could take this further (like I probably will) and allow for different versions of the file to exist rather than just replace them in the database. But that’s a problem for another day.