Data Access Models with Node

Instead of writing CRUD methods for every table we need to access, we can create a module that provides the data and can be extended as…

Photo by Maarten van den Heuvel on Unsplash

Instead of writing CRUD methods for every table we need to access, we can create a module that provides the data and can be extended as needed. This article goes through that process and provides a sample application summarising the steps.

The related repository can be found at https://github.com/grovercoder/article-data-access-model.

I’ve written many applications and just about every one of them needs a common entity — a data access model specific to a database table. In the past I’ve created these on the fly but the issue keeps coming up so it is time to make this easier.

We may be able to accomplish this by using an Object Relational Mapping system like Squelize or Objection.JS. These are robust generic systems which may do way more than we will ever need. In some cases adding yet another framework into your application is not the best solution. In my case, the database needs are either very basic, or very specific to the application and database. So an ORM is normally not the right answer.

What we want is a simple generic module that provides the common CRUD methods, yet can be extended with additional functionality.

Note: We are talking about abstract ideas here. But when it comes time to USE those ideas, we need a concrete example. We will assume a simple Tasks database as we work through the code. That DB will have two tables — groups and tasks. We will use these as we build up the instances.

Requirements Link to heading

We should probably define what we are trying to build in a little more detail so that we know when we are done.

The core module should not have any knowledge of the database or table.
The database connection and the table schema (aka the list of fields and their data types) should be passed into the module.
The module will then use the schema information to build the necessary functions.
The module will export an object that can then be immediately used to work with the table specified.
The solution should allow some fields to be “guarded”. That is, when a record or record set is returned, the guarded fields should be omitted from the output. (i.e. a user’s password should not be included in the default usage patterns.) Guarded fields are still available though through custom methods if desired.
The CRUD methods should all be defined on the exported object.
The CRUD methods can be overridden if needed.
The exported object may include read-only informational properties (i.e. reflect back the tablename, the schema, and the guarded values as they are currently set).

First Draft Link to heading

Based on those requirements we are after something that looks a little like this:

// first draft of _model.js
module.exports = {
  create: null,
  find: null,
  findById: null,
  update: null,
  remove: null
}

// possible usage:
// get a reference to the model
const model = require('./_model.js')

// create a record
model.create({name: 'Task #1', description: 'Make sure the first task gets done.', completed: false})

// find a record using an ID
const record = model.findById(1)

// find a record using an AND match of the specified properties
const record2 = model.find({name: 'Task #1'})

// update a record
model.update({id: 1, completed: true})

// remove a record using its ID
model.remove(1)

We can see that this immediately fails because we have no way to specify the tablename, schema, or guarded values. We can address this by exporting a function instead of an object. The function would take in the desired data via a config parameter, and return the desired object. This is an implementation of a factory method then.

// version 2 of _model.js
module.exports = (config) => {
  return {
    create: null,
    find: null,
    findById: null,
    update: null,
    remove: null
  }
}

// usage:
const TaskModel = require('./_model.js')({
  connection: null,
  tablename: 'tasks',
  schema: {},
  guarded: []
})

TaskModel.create({name: 'Task 2', description: 'Order Pizza', completed: false})

Here we are utilising Dependency Injection to pass in the database connection and other model configurations. Our _model.js file then does not have a dependency on anything specific (yet).

To continue we need to make two choices that will tie down how the module can be used in the future.

First, we need to decide how we will specify the table schema.
Secondly we need to decide what our database connection object will look like.

Both lock this version of the module down to those specific decisions. For instance, if we decide that the connection will be a configured Knex.JS object, then we can never use this definition of the model for a MongoDB data store. (Not unless our MongoDB connection has the same methods as the Knex.JS object.) With some effort we can mitigate that, perhaps with an Adapter pattern. For our purposes though we will keep things (relatively) simple and assume that we would just create a different instance of the module to handle different connection types. So we might end up with _model.knex.js and _model.mongo.js . Then our application models could just require the appropriate base module. This is just one option though.

Defining the Schema Link to heading

We could just define our schema with an array, or even an array of objects. Or a comma delimited string. However JSON Schema is already a great tool that addresses this needs. Not only does it allow us to define our fields and field types (in a way that works for JS), it allows us to use a standard system and not need to define our own.

We will require that all instances of our model object must pass in a JSON Schema object to define the list of fields for our table. We’ll see an example of this in a moment.

Defining the Database Connection Link to heading

As described above, we could pick any database connection object. For our example here we will use Knex.JS. Simply because I need to connect to RDBMS type databases on a regular basis, and I believe this is a very common task.

Creating a better instance object Link to heading

We have so far just been using our core model in the usage sections. This is not how we would do it for real though. Instead, lets create an actual instance for both of our sample tables — Groups and Tasks.

A quick note about the naming standards. My models are named using the SINGULAR tense of the the table name. The table name is the PLURAL tense of the objects the table contains.

// guard.js

// assume the _connection.js file sets up, configures, and returns a Knex.JS object
const knex = require('./_connection.js')
const Model = require('./_model.js')({
  knex,
  tablename: 'groups',
  schema: {
    title: 'groups',
    type: 'object',
    required: ['name'],
    properties: {
      id: { type: 'integer' },
      name: { type: 'string' },
      created_at: {
        type: 'string',
        format: 'date-time',
      },
    },
  },
  guarded: [],
})

module.exports = Model

// task.js

// assume the _connection.js file sets up, configures, and returns a Knex.JS object
const knex = require('./_connection.js')
const Model = require('./_model.js')({
  knex,
  tablename: 'tasks',
  schema: {
    title: 'tasks',
    type: 'object',
    required: ['name'],
    properties: {
      id: { type: 'integer' },
      group_id: {type: 'integer' },
      public_code: { type: 'string' },
      name: { type: 'string' },
      description: { type: 'string' },
      completed: { type: 'boolean' },
      created_at: {
        type: 'string',
        format: 'date-time',
      },
    },
  },
  guarded: ['public_code'],
})

module.exports = Model

Here we are passing the “knex” object into the models, along with the tablename, schema and guarded values.

We’ve included the “guarded” value for the Task model so that we do not need to revisit this later. Imagine that the public_code field is generated by the database and used in a third party system, but our code doesn’t really need it. By marking it guarded, it will be omitted from the output of our model functions. (we need to build that yet). So, the guarded value is just an array of field names that match the “properties” of the JSON Schema object. An empty array is valid too, indicating we do not need to guard any fields.

The Schema looks complex, but it is mostly just cut and paste an existing definition and tweak it as needed. There is more we can do/specify with JSON Schemas, but we are most interested in the list of properties at this time. The required property can be very useful as well, but I’ll leave that to you to extend if/when you need it.

Our instance model definitions are complete then. From here we have access to all the functionality our _model.js module exposes for us. That is very little right now though.

Adding functionality to the models Link to heading

There are two needed functions we have not yet talked about:

The guard() method. This method examines a data object and removes any properties named in the guarded array. This is mostly an internal function that will get called by the CRUD methods where needed.
A populate() method. This accepts a data object and then extracts only the properties that match with the schema’s defined properties. This provides us with a rudimentary validation process and ensures the resulting data object can be used in our create and update methods. No additional checks are needed to ensure we didn’t inadvertently pass in a non-existent field. Again, this is mostly an internal method. (Good validation is still strongly recommended though, but is beyond the scope of this article…)

We will create these two methods first as the rest depend on them AND these both nicely define the secret for the remaining CRUD functions.

// version 3 of _model.js
const moment = require('moment')

module.exports = (config) => {
  function _guard(config) {
    return (record) => {
      if (Array.isArray(config.guarded)) {
        for (const field of config.guarded) {
          delete record[field]
        }
      }
      return record
    }
  }

  function _populate(config) {
    return (data) => {
      let output = {}
      if (config.schema && config.schema.properties) {
        for (const field of Object.keys(config.schema.properties)) {
          let prop = config.schema.properties[field]
          let val = data[field]
          // only include properties from the data variable that
          // a) exist in the schema properties list, and
          // b) are assigned a value (skip undefined values)
          if (typeof val != 'undefined') {
            if (prop.type == 'integer') {
              val = parseInt(val, 10) || 0
            }
            if (prop.type == 'number') {
              val = parseFloat(val, 10) || 0
            }
            if (prop.type == 'string' && !prop.format) {
              val = String(val)
            }
            if (prop.format == 'date-time') {
              val = moment(val) || null
            }
            output[field] = val
          }
        }
      } else {
        output = data
      }

      return output
    }
  }

  return {
    guard: _guard(config),
    populate: _populate(config),
    create: null,
    find: null,
    findById: null,
    update: null,
    remove: null,
  }
}

First we define two functions within the factory method — _guard and _populate . Both are functions that return a function. The config parameter (defined as part of the module’s factory method) is passed into both methods. This allows us access to the configuration data. Without this approach we would need to call the functions in the form of MyModel.guard(config, record) which just is not natural. Instead we want MyModel.guard(record) and the “MyModel” object should already know what config info is needed.

The guard method is straight forward. The function it returns accepts a “record” parameter (which is just expected to be any normal key/value object). It checks to see if our model configuration has defined a guarded array. If so it loops over each value in the guarded array and removes that field from the record object. Then the record object is returned, without the undesired fields.

The populate method is only a little more complex. The function it returns takes in a “data” object. It checks to see if the model has defined a schema.properties object. If so, it extracts the keys from that object and loops over them. The property object is extracted from the schema — this will be used later to make sure we return the correct data type. And the current value of the incoming data object is extracted, as we will be working with this a bit. If the extracted value is defined, then we compare it with the schema property’s type to ensure it is set to an appropriate JS data type.

This section can be expanded to include other items if/when needed. For instance if the property format is “email”, you might want to validate that the current value is a proper email address, or wrap it in a mailto anchor tag. Feel free to expand this section if needed. (Or ANY of the code, of course).

The method of passing the config object to a function that will then return the final version of the function is repeated for each of the CRUD operations. These then do the steps necessary to perform the appropriate CRUD action, but using the config object to set that up. For instance the findById method can be accomplished with the command config.knex(config.tablename).where({id: passedInID}).first(). This is a simple Knex.JS command. In the code below I call a few Knex methods and the logic should be pretty clear. I recommend looking at the Knex.JS documentation to learn more about what can/should be done here.

Private vs Public Properties/Methods Link to heading

In the current version of our model, we have set guard and populate methods on the returned object. These are now public properties to the object and can be overwritten. Our CRUD methods will be the public as well. This meets our requirement that they can be overwritten.

BUT, we have a requirement that says we should have some read only properties that expose the current configuration settings for tablename, schema, and guarded. We can accomplish this by making use of the native JS Object.defineProperty method. Instead of returning the desired object directly from the factory method, we can place it into a temporary variable, add our read only properties, and then return the temporary variable.

const output = {...}    
Object.defineProperty(output, '_tablename', {value: config.tablename, writeable: false})

return output

We will clean up the code a little and assign our methods AFTER we assign our read only properties. This will result in the read only items being at the top of the object when you do a console.log() on it.

The final _model.js file Link to heading

So our final _model.js file should look something like this:

// version 4 of _model.js
module.exports = (config) => {
  /*
   *  GUARD
   */
  function _guard(config) {
    return (record) => {
      if (Array.isArray(config.guarded)) {
        for (const field of config.guarded) {
          delete record[field]
        }
      }
      return record
    }
  }

  /*
   *  POPULATE
   */
  function _populate(config) {
    return (data) => {
      let output = {}
      if (config.schema && config.schema.properties) {
        for (const field of Object.keys(config.schema.properties)) {
          let prop = config.schema.properties[field]
          let val = data[field]
          // only include properties from the data variable that
          // a) exist in the schema properties list, and
          // b) are assigned a value (skip undefined values)
          if (typeof val != 'undefined') {
            if (prop.type == 'integer') {
              val = parseInt(val, 10) || 0
            }
            if (prop.type == 'number') {
              val = parseFloat(val, 10) || 0
            }
            if (prop.type == 'string' && !prop.format) {
              val = String(val)
            }
            if (prop.format == 'date-time') {
              val = moment(val) || null
            }
            output[field] = val
          }
        }
      } else {
        output = data
      }

      return output
    }
  }

  /*
   *  CREATE
   */
  function _create(config) {
    const populate = _populate(config)
    const guard = _guard(config)

    return async (data) => {
      const record = populate(data)
      delete record.id
      record.created_at = new Date()

      // write the new record and return the guarded value of it
      const newId = await config
        .knex(config.tablename)
        .insert(record)
        .returning('id')
      const newRecord = await config
        .knex(config.tablename)
        .where({ id: newId[0] })
        .first()
      return guard(newRecord)
    }
  }

  /*
   *  FIND
   */
  function _find(config) {
    const guard = _guard(config)

    return async (conditions) => {
      const criteria = {}
      // discard conditions that do not match the schema
      for (const field of Object.keys(config.schema.properties)) {
        if (typeof conditions[field] !== 'undefined') {
          criteria[field] = conditions[field]
        }
      }
      // get the data
      let records = await config.knex(config.tablename).where(criteria)
      return records.map((row) => guard(row))
    }
  }

  /*
   *  FINDBYID
   */
  function _findById(config) {
    const guard = _guard(config)

    return async (target) => {
      const criteria = {}

      // get the data
      let record = await config
        .knex(config.tablename)
        .where({ id: target })
        .first()
      return guard(record)
    }
  }

  /*
   *  UPDATE
   */
  function _update(config) {
    const populate = _populate(config)
    const guard = _guard(config)

    return async (newData, transaction) => {
      if (typeof newData.id == 'undefined') {
        throw new Error(`Record ID not specified`)
      }

      let record = populate(newData)
      delete record.id
      delete record.created_at
      record.updated_at = new Date()
      await config
        .knex(config.tablename)
        .where({ id: newData.id })
        .update(record)
      const updatedRecord = await config
        .knex(config.tablename)
        .where({ id: newData.id })
        .first()
      if (!updatedRecord || !updatedRecord.id) {
        return {}
      }

      return guard(updatedRecord)
    }
  }

  /*
   *  REMOVE
   */
  function _remove(config) {
    const guard = _guard(config)

    return async (target) => {
      let record = await config
        .knex(config.tablename)
        .where({ id: target })
        .first()
      await config.knex(config.tablename).where({ id: target }).del()
      return guard(record)
    }
  }

  // create the output object
  const output = {}

  // add the read only properties
  Object.defineProperty(output, '_tablename', {
    value: config.tablename,
    writeable: false,
  })
  Object.defineProperty(output, '_schema', {
    value: config.schema,
    writeable: false,
  })
  Object.defineProperty(output, '_guarded', {
    value: config.guarded,
    writeable: false,
  })

  // add the public methods that can be over written
  output.guard = _guard(config)
  output.populate = _populate(config)
  output.create = _create(config)
  output.find = _find(config)
  output.findById = _findById(config)
  output.update = _update(config)
  output.remove = _remove(config)

  return output
}

Assuming we have no errors, and the database connection module works properly, we would have full CRUD operations to every table we define an instance model for.

Overwriting methods Link to heading

One of our requirements is that the CRUD methods can be overwritten. Let us imagine that our Task model needs some additional functionality to the create method — we do not want to ever create a task where the name is ‘TASK’. And if the name is ‘RUN’ we want to change the name to ‘automated task’. Well, we still need a create method, we just need a specialised version of it.

// task.js

const knex = require('./_connection.js')
const Model = require('./_model.js')({
  knex,
  tablename: 'tasks',
  schema: {
    title: 'tasks',
    type: 'object',
    required: ['name'],
    properties: {
      id: { type: 'integer' },
      group_id: { type: 'integer' },
      public_code: { type: 'string' },
      name: { type: 'string' },
      description: { type: 'string' },
      completed: { type: 'boolean' },
      created_at: {
        type: 'string',
        format: 'date-time',
      },
    },
  },
  guarded: ['public_code'],
})

const core_create = Model.create

Model.create = (data) => {
  if (data.name) {
    if (data.name.toLowerCase() == 'task') {
      throw new Error('Invalid Task Name')
    }
    if (data.name.toLowerCase() == 'run') {
      data.name = 'automated task'
    }
  }
  
  return core_create(data)
}

module.exports = Model

We start off with an identical definition of our model instance. Then we store a reference to the original create method. Next we assign a new function to the Model.create property. Inside that function we may choose to call the original method, or we could place the appropriate Knex code here to accomplish our task. The point here is that we have replaced the “core” create method.

If you were to try Model._tablename = ‘new value' though you would receive an error as this is a read only propery.

Sample Repository Link to heading

I’ve created a sample repository containing the ideas defined in this article. This sample code demonstrates why SQLite should not be used in production though. Our create and update methods on the _model.js file call the .returning() method from Knex. SQLite does not support this and prevents us from just getting the new or updated data directly. I’ve solved this in some applications by immediately doing a .find() with the populated record data, minus anything that would be generated automatically (like the created_at field). This might look something like the following:

async function create(data) {  
  const record = _populate(data)  
  await config.knex(config.tablename).insert(record)  
  delete record.created_at  
  const newRecord = await config.knex(config.tablename)  
      .where(record)  
      .first()  
  return _guard(newRecord)  
}

This might work for you if you are using SQLite, or cannot rely on the .returns() method.

Conclusion Link to heading

There is plenty we could do to improve on this. Add in better validation and error handling, handle database transactions, create a Test for the model, or just tweak it to our needs. This should be considered a starting point for your own application requirements.

I’m open to comments / suggestions / questions to improve this. I’m just one man and I’ve built this with little peer review. The solution works will for my applications though. I hope you find it useful for your own apps.