Feed Content is the first screen of the app. When a logged-in user opens the app, home feed is the first thing a user sees. Our home feed comprises of koos from the user followings and other activity (like, comment, rekoo) on koo.
why → To show the activity of the followings of a user
how → Aerospike and Graph DB
- The home feed is a re-rank the timeline (chronological) feed of user so that we surface more relevant content on top from the time he / she was last active on the App.
- We generate it on runtime for a user when he is eligible for rank feed.
- Rank feed is basically some computation on the user timeline feed based on signals and weightage.
- To avoid latency for fetching the signals value we use high read throughput low latency data store.
- Majorly we use — Aerospike and ArangoDB (Graph-DB)
Top Metric We measure → Reactions and TSOA
- All the content which the user can like should be in the first page of the feed.
Technical Metrics →
- 5 node Aerospike Cluster
- 3 node ArangoDB Graph Cluster
- Peak write throughput on Aerospike -> 75k TPS (75 writes operation per second)
- Peak read throughput on Aerospike -> 10K TPS
- Latency -> 150 ms (average)
We have just started on Rank Feed. Some plans which team is looking to execute next
- Introduce ML model to modify the weightage based on user inputs / interests.
- Using UDF in Aerospike to compute the rank feed on Aerospike server layer rather than application layer.
- Integrate Aerospike with big data platforms (Snowflake/Pinot) etc to create data pipelines for more analytical and real time event driven based applications..
This system houses our social graph i.e. connections among people and groups. Whether you like watching content from someone, follow someone, view content of other languages or show interest in a topic — all this information is kept in our social graph.
why —> To understand the user’s affinity towards other users, people, languages, topics and use it to improve their experience on the platform.
how —> ArangoDB
- We store our social graph as a property graph using ArangoDB.
- All the entities are stored as vertex documents.
- The relations between the entities are stored as edge document.
- Updates to the social graph are asynchronous, we do periodic cleanup/reconciliation to maintain data consistency across the stores.
- To speed up the AQL queries we use ArangoSearch Views and primary sort.
- We use satellite graph to do optimized joins in a clustered setup.
- We maintain multiple replicas of social graph for different use cases and SLA e.g. point lookups, graph traversals, iterative graph processing/Pregel.
Top Metrics —>
- Incoming/outgoing connections.
- Influencer nodes — nodes which are connected with a lot of other nodes.
- Query latency for intersections and ranking.
- Traversal time per level
- A 3 node enterprise cluster hosted using high I/O compute instance nodes.
- Several community clusters for analytical workloads like iterative graph processing.
- Data is persisted in EBS GP3 volumes with provisioned IOPS for predictive performance.
- Spark connector is used to run graph analytical on ArangoDB data
- Peak ingestion RPM 72K
- Read RPM 150K
- General lookups -> ~20ms
- Traversal queries -> 100ms-500ms
The social graph has given us rich query/traversal capabilities at a linear scale and reduced operational complexity. It’s especially useful for understanding our highly connected data.
There is a lot of interesting work that we plan to do in future
- Build a knowledge graph which can answer generic & complex questions about the network.
- Community Detection — Applying various clustering techniques to understand the interconnectedness in the graph and find communities and hidden patterns.
- Run graph analytics to surface trends and sharpen our recommendations with near real time data.
Users trying to discover folks pertaining to some interests / profession is quite a common thing in a social network like Koo. Our people recommendation feed kicks in to recommend our users to follow easily. The manifestation results in
- Profession carousel (sports, media, politicians etc ..)
- RFY (recommended for you carousel — more sharper list)
- Location based carousel (nearby)
why → Discovery of the creators for a user to so that they can subscribe to relevant (personalise) content.
how → ArangoDB and Postgres
- In ArangoDB we store the documents of a creator with all the necessary attributes which can be asked as filters on the runtime.
- Before serving the people one of the challenges is negation of already following/blocked and other unwanted people which is powered by Arango again
- Our ML team runs ML models which compute user specific carousel (RFY) and data pipelines are set up in the flow to consume the data from the ML model and eventually show the recommendation to the consumer.
- Once the user comes on the app depending on the app flow we fetch all types of carousel data from different sources.
The next set of optimisations/improvement team is working on for people feed are follows
- Improve the RFY model data flow to make recommendations more fresh.
- Migrate all the feed people data from the shared DB to true microservice architecture based system.
- Use ArangoDB graph functionalities to go more deep in the user network ; enrich the graph data with more signals and weights.
- Improve the friends/contact based flow for inviting/suggesting to user and get it powered via ArangoDB
Do more with less
In this section, we will talk a bit about cost. All the engineering magic and user delight come with cost of infra. The engineering teams along with DevOps can label their infra, track and optimise etc.. These are well established BAUs that many teams follow. However there should also be one driving metric which makes sense from unit cost.
We have developed a simple math formula to understand the same.
CMUS — Cost (per) million user seconds
CMUS = ICU / (TSOA * DAU/1,000,000)
ICU → Infra Cost
TSOA → Average time spent on app per user
DAU → Daily active users
As we optimise infra and also build new things simultaneously, we try to keep a tab on this number. It gives us an idea as to how efficiently we are utilising our infra on a MoM and at the same time how are the business metrics playing out. It is still a bit early to say what should this number ideally be, but we know we are building efficiently as long as the numbers don’t increase linearly and CMUS remain relatively within bounds.
As a closing thought, engineering should be fun — period !
- Aerospike summit → Building a near time personalised Feed https://docs.google.com/presentation/d/1D0JvKsJxMbXSZrPF8UmGKGsBZ_zHFxMr/edit#slide=id.g12d61f9c83b_0_6