Today, we're diving into the creation of a minified version of HackerDigest. Here's how HackerDigest operates: it begins by fetching the top 10 stories from Hacker News. Next, it leverages OpenAI's ChatGPT to process and analyze these stories.
Finally, it stores the results in Redis. This entire process is made super easy thanks to Qstash. So, buckle up as we set out on this journey.
First things first, let's set up a Redis. Head over to Upstash Redis Console to get your keys, which will look something like this:
Good, now head over to Qstash, and get your secrets that are needed for NextJS.
One more thing left, we also need OpenAI API key for that just create yourself a key and you are good to go.
Final env file will look like this:
Env Key Setup
With the grunt work behind us, give yourself a well-deserved pat on the back. Now, let's dive into the juicy part.
Initial Setup
We need NextJS for that, run this command below:
And, do these:
Then, see if your project works without an issue, simply run:
Okay, before we move on we have to put our env keys into env.local file. Create env.local in the root of your project - where package.json lives. Then, simply copy them.
So far so good. Let's move on.
Folder Structure
Looks cool, right? We will start off with libs folder. This is where we initialize our clients like Redis and fetcher - Ky in our case.
Fun Begins
📂libs/📜redis-client
📂libs/📜requester.ts
Now, let's proceed to services/hackernews.ts. In this section, our goal is to:
Fetch all the top stories from Hacker News.
Filter these stories based on their scores, ensuring they fall within a 12-hour period.
Retrieve the details of the selected stories and return them.
📂services/📜hackernews.ts
Now, we need to filter them.
We'll also need some types to ensure type-safety. Here's an overview of what this function does:
It begins by fetching topStoryIds.
For each story, it calls fetchStoryDetails (which we'll implement shortly).
The stories are then filtered to ensure they fall within the specified time period, sorted by their scores.
Since we only need 10 stories (our limit), the list is sliced accordingly.
Before returning the data, a final mapping step is performed to enhance the API's user-friendliness, appending additional data.
Before moving on let's add fetchStoryDetails
For the fetchStoryDetails function, it's a straightforward call to the endpoint with the story ID. However, there's a caveat: not all items retrieved are stories; sometimes, there are different data types. To address this, we need to ensure that the fetched items are indeed stories before proceeding.
Now that we've covered this part, let's proceed to parsing links within Hacker News stories.
Since many stories redirect users to external links, it's critical to tell our summarizer to navigate to these links, extract their content, and provide it back.
This way, we can effectively feed ChatGPT with the relevant information.
📂services/📜link-parser.ts
Our approach for parsing links within Hacker News stories involves navigating to the provided URL, using node-html-parser for extraction.
We'll check if the page contains p tags; if present, we'll extract the entire content. In case p tags are missing, we'll fallback to extracting content within div tags.
The extracted content will then be pushed into chunkString since some stories are extensive, and breaking them into smaller chunks is necessary to comply with ChatGPT's token limit (which restricts input to less than 4K tokens at once).
We'll also have some utility functions to make things easier and cleaner.
It's time to feed our parse with some Hacker News data!
To prepare for the next steps, we begin by extending our HackerNewsStory type with additional fields: rawContent and parsedContent for future use.
Following this, we call fetchTopStoriesFromLast12Hours to retrieve the top 10 stories. These stories are then mapped over, and the URLs are parsed into rawContent.
This parsed content will be utilized in the subsequent stage, where we feed it to ChatGPT.
Time to move to most critical part.
📂services/📜summarizer.ts
This is where the magic happens. Fortunately, most of the code is quite straightforward, mainly consisting of OpenAI configurations. Let's dive into it from the beginning.
We need two functions: one for summarizing short texts without chunking and another for summarizing chunks without losing much context. Feel free to make adjustments as needed.
Now, we need a function to determine whether to call the summarization function for an array or a string, depending on the type of rawContent.
It's a straightforward process: we check if rawContent is an array. If it is, we call summarizedChunks and then feed those results into summarizedText. If not, we directly call summarizedText.
Next, we'll call the function getContentsOfArticles to retrieve parsed raw contents.
These contents will then be fed into the summarizer function. Finally, we'll omit the rawContent from the object since it's no longer needed.
Now, all we have to do is create an endpoint to call this function. But before that, we need to figure out a way to store those processed stories, and Upstash Redis comes to the rescue.
We can easily store them in Redis once we are done processing them, allowing us to use them anytime in our API, pages, or even in another app, if needed.
📂commands/📜constants.ts
We need to store our key somewhere, and constant.ts is a good place for that. Maybe in the future we might end up with more keys.
We also need a way to get them back from Redis when we need so where comes the get.ts
📂commands/📜get.ts
Another one for storing.
📂commands/📜set.ts
These utilities are quite straightforward to use. Once you begin using redis.get and redis.get, you find yourself passing the key - in our case, 'hackerdigest' - and return types throughout the app, which can quickly become ugly and difficult to maintain.
So, we opted for this approach to streamline the process.
Now, it's time to make our APIs we will start with summarize end point.
📂api/📂summarize/📜route.ts
Firstly, we need to set maxDuration to a value higher than the default, which is 15, because OpenAI processes may take some time due to extensive computations.
The rest is relatively simple – we just invoke getSummarizedArticles with the desired limit.
If the operation is successful, we save the results to the cache and return.
To enable periodic API calls with Qstash, we wrap the handler with verifySignatureAppRouter to ensure that only Qstash can access this endpoint. Additionally, it should be a POST request.
Now, we need a way to get those processed stories.
📂api/📂stories/📜route.ts
This is also quite straightforward; we simply invoke getArticles. However, to prevent abuse of our app, we need to implement rate limiting using @upstash/ratelimit and use the user's IP address as an identifier.
Let's add a simple HTML and Tailwind to see how it looks.
📜page.tsx
Also, make sure to copy those CSS styles to your global.css file. This will give your cards and buttons a cool neon look.
Conclusion
So, there you have it – a HackerNews summarizer that not only works like a charm but also looks darn good doing it.
Take this recipe, add your own secret sauce, and who knows, maybe you'll create the next internet sensation. Happy hacking, friends! 🚀