Problem Statement
Rippling develops products for HR domain. Design a system to support the following
Upload documents.
Update the verification status of the document.
Fetch the verification status of the document.
Volume
500K documents to be processed in 24 hours.
SLA
Each documents needs to be processed within an SLA of 1 to 4 hours depending on the priority.
Assumptions
Assume the documents are verified manually by an operations team.
Each document is associated to an employee with 1:1 relationship.
Duration
60 minutes on CodePair
Solution
APIs
Define APIs with request and response body structures.
1. /createDocument
API used to upload a new document. Needs to have validation to verify if document of same type already exists for the given employee and organisation combination. The default verification status as part of the API will be unverified
HTTP Method : POST
Payload :
Request:
{ employeeId : xxx,
organisationId : yyy,
documentInfo : { // Extensible Json payload associated to document such as binary payload, documentId, documentType etc }
}
Response :
{
employeeId : xxx,
organisationId : yyy,
documentId : // unique Identifier for the document
}
2. /updateDocument
Used to update document verification status.
HTTP Method : PUT
Payload :
Request:
{ employeeId : xxx,
organisationId : yyy,
documentInfo : { // Extensible Json payload associated to document such as binary payload, documentId, documentType etc }
}
Response :
{
employeeId : xxx,
organisationId : yyy,
documentId : // unique Identifier for the document
}
3. /getDocumentStatus
Used to fetch verification status of the document
HTTP Method : GET
Payload :
Request:
{ employeeId : xxx,
organisationId : yyy,
documentId : // unique Identifier for the document
}
Response :
{
verificationStatus : // valid values
}
Database
There’s no one correct solution and we can make it work with either choice of SQL or NoSQL database with combination of employeeId
and organisationId
as primary key. I recommend to save the file in a cloud storage such as AWS S3 and have only the url to the file persisted in the database.
I recommend a document DB like MongoDB to persist the document info that enables us to query from any of the fields with ease.
Design
Document upload and verification should be an asynchronous system as the verification process is manual with an SLA of 1 to 4 hours. Latency is not a major concern and fetching the document status isn’t expected to be frequent for the same set of documents, hence there is no need to build a caching layer.
Follow up questions
Interviewer could optionally tweak the questions and ask the following questions
/createDocument uploads and initiates the verification process for a single document. What would you do if you were do bulk upload of documents or alternatively process pre existing set of many documents. The solution is batch processing. You can address it via one producer to many consumer processing via messaging queue or even background jobs that can be triggered both periodically and on demand manually.
Dive Deeper on Ticketing System if you were to build it on your own and not use an existing solution such as Jira.
How would you determine the ticket priority to ensure the SLA is met ? SLA could vary based on document type. Depending on balance time the ticket priority could be auto updated and alerted to the team.
Insightful read!