Coffee Recommender System Using Content-based Filtering

Coffee is a type of beverage that is brewed and made from roasted seeds, or beans from coffee plants. Despite this, there are many coffee drinkers and non-coffee drinkers who do not know much about coffee knowledge. Besides that, there were many cafes that were being affected by the pandemic which caused a business drop. In addition, there are many new cafe owners who do not know much about coffee beans that are suitable to serve their customers. There is a need for a digital method to assist coffee drinkers and café owners either new or old, to receive recommendation about coffee beans that can serve the preferences of avid coffee enthusiasts or specialty coffee service provider. It is important to develop such a recommender system due to the overwhelming variety of coffee available. The goal of this project is to create a recommender system that informs coffee drinkers about various coffee bean varieties based on their personal preferences in coffee and provide a list of cafes that serve speciality coffee in Selangor focusing on Bandar Baru Bangi, Kajang and Serdang. The coffee and beverages that met the users' taste requirements were categorized and presented using this approach. A content-based filtering method that compares items depending on the user's preferences is used to generate the suggestion. In this project, a modified waterfall model of System Development Life Cycle was used to develop the prototype. This system was tested using functionality testing and accuracy testing and showed positive test results and the algorithm of content-based filtering was applied successfully. For future work, further development can be made to propose cafés instead by narrowing down the availability of preferred coffee beans at specific cafes. The recommendation method can also be fine-tuned by exploring the expert system approach in recommendation.


Introduction
This project aims to produce a coffee recommender system that provides information about coffee beans based on the drinker's preference of taste and a list of cafes that serve speciality coffee in selected areas.This system classified and presented the coffee beans that fulfilled Vol 12, Issue 3, (2023) E- ISSN: 2226-6348 To Link this Article: http://dx.doi.org/10.6007/IJARPED/v12-i3/19317DOI:10.6007/IJARPED/v12-i3/19317 Published Online: 28 September, 2023 the drinkers' taste criteria.Many coffee drinkers out there love to drink coffee but do not really know what kind of coffee they drink.This could lead to difficulty in finding the same taste of coffee but in different places.To solve the problem, this system is designed to help drinkers and beginner café owners find coffee beans that match their preference of taste and cafes that sell speciality coffee drinks so the user can have the best coffee for the day.As a result, this system makes it easier and more convenient for coffee drinkers to know the beans' names based on their preferred flavour and get recommendations on new coffee beans that have similarities with their favourite beans.Besides that, they can also easily look up to the café in selected areas.

Objectives
• To design a web-based system that recommends coffee beans that match their taste preferences and provide a list of cafes that serve speciality coffee drinks in Bandar Baru Bangi and nearby.• To develop a web-based coffee recommender system for coffee beans using a contentbased filtering algorithm.• To test the functionality and accuracy testing of the system.

Problem Statement
There are many coffee drinkers and non-coffee drinkers that did not know much about coffee knowledge.Based on a survey that was conducted, with 75 respondents, on average there were 56% of the respondents really did not know about basic knowledge of coffee and beans.One of the questions given in the questionnaire is if they knew about four main types of coffee beans and there were only 21.3% of respondents really knew about the fact while the remaining 42.7% and 36% barely know and do not know about it respectively.One of the famous coffee beans came from Brazil and it has been served in many specialty coffee places.Despite this, many of the drinkers did not know what kind of coffee beans were used in their coffee and the taste profile of their drink.Based on the questionnaire, there were 32% and 56% of respondents do not know about Brazil beans and their taste profile where these beans give a classic, mellow, and smooth texture also is taste slightly sweet and have a hint of chocolate.Therefore, this survey has shown that many coffee drinkers did not know what was used in their coffee.Furthermore, they had a difficult time finding the right coffee shops that provided the same coffee likely to match their preferred taste.Furthermore, the coffee industry is a trend nowadays and a leading sector among other food and beverage markets in Malaysia.A perfect example of a food and beverage brand that has an accredited business operation is Starbucks.At the beginning of 2020, the Covid-19 crisis has had an impact on Starbucks financially and economically.Based on an article by Arnold (2020), even Malaysian Starbucks operator, Berjaya Food Berhad reported a loss of as much as $1.82 million for the latest financial year due to Movement Control Orders (MCO) during the Covid-19 crisis.Therefore, it could also have an impact on the company's strategies and operation where they were not able to operate in its traditional way.Be mindful of the fact, that this could also affect Small and Medium Enterprise (SME) that also sells coffee.If this pandemic has a big impact towards big companies such as Starbucks, those small coffee businesses also might affected and lose their business.Consequently, it causes businesses to drop, and they must set a new strategy to gain back their customers and find new ways of marketing to attract more people.
Finally, in the coffee sector, there would be challenges in having the right coffee bean provider.Best drinks must be served to the customers, and speciality coffee is distinguished from industrial coffee through its high quality, restricted supply, freshness, special flavours, package, or consumption atmosphere (Bacon (2005), Daviron andPonte (2005), Kusmulyono, et al. (2023)).The café owner must know the best coffee bean supplier that serves the best quality and good price to maintain the quality of their drinks.Edelmann, et al. (2020) found that direct trading started to improve coffee quality and producers' income and establish regular communication.Furthermore, the benefits of an increased number of consumers and their awareness, resulting in a willingness to pay a higher price, go to the roasters or coffee shop owners than to a producer.

Methodology
Waterfall methodology employed a linear or sequential approach in developing software.The project is divided into a series of tasks, with phases designating the highest-level grouping (Sherman, 2015).Sequence phases were required in the waterfall technique which includes planning the project requirements, designing the system design, coding implementation, project testing and maintaining the system.The traditional waterfall model and the modified waterfall model are two types of waterfall models.The Waterfall approach was established in 1970 by Winston W. Royce (Prashant, 2022).However, the waterfall model has its drawbacks which is not suitable for unclear and changing requirements.Furthermore, in the waterfall model, once an application is in the testing phase, it is very difficult to go back and change something that was not correct (Salve et al., 2018).To overcome these problems, the modified waterfall model was created (Prashant, 2022).Hence, the modified waterfall model is suitable for this project.The modified waterfall model allows a return to the previous step overlapping as necessary.Furthermore, the modified waterfall model offers a systematic progression of development procedures with some flexible iterative phases to permit enough documentation and design reviews to assure the quality, dependability, and maintainability of the created custom software development.Figure 1 shows the phases in the modified waterfall model of the System Development Life Cycle (SDLC) that are involved in this project.Figure 2 shows the flow chart of the system.Firstly, the page asked for the user's choice of coffee preference based on the taste of the coffee.The information from the user's query will then be filtered using the keyword extraction method.The recommended coffee beans based on keyword extraction in the review column were then displayed throughout the page.Furthermore, the option recommender based on similarities appeared on the result page.The recommender page asked for the user's query but in the context of the coffee name.The information from the user's query was then filtered using a content-based filtering technique.The user's query needed to not be empty to proceed with the process.To comprehend the complexity of the data communicated between the system and the user, a use case diagram is next produced.The use case served as a representation of the topic boundary separating the system from the user.Figure 3 displays the use case diagram for this system as the interaction between the user and the system is needed to complete the task.Based on Figure 3 the user could give a query based on their coffee preference to the system and would receive the recommended coffee beans and café.The recommendation system used a content-based method to develop the project.Utilizing keywords and attributes attached to the coffee flavour profile database, content-based filtering generates recommendations.The content-based method filters the user's preferred flavour in coffee based on the description that the user entered.This approach is wholly dependent on matching user preferences to product attributes.Furthermore, the recommended coffee beans are those that share the most characteristics with the user's interests such as the smoothness of the coffee or the bitterness level in the coffee.Figure 4 shows the content-based filtering architecture.The collected data was analyzed by the recommender, and the content-based filtering method was applied.Furthermore, the data was then evaluated based on various factors, including similarity, uniqueness, closeness, and relevance.The top-ranked pages were shown as hyperlinks on the current web page, and interaction with the user occurred.The user's feedback is used to create a user profile.Lastly, the user profile and the collection of documents are then compared by the recommender system.To evaluate the content-based filtering method, we use the cosine similarity metric to calculate and identify similarities between products and the user's interests.Cosine similarity measures the cosine of the angle between two vectors in a multidimensional space and ranges from -1 to 1.A higher cosine similarity value indicates more shared traits and a higher degree of similarity between the items.

d) Data Collection and Pre-processing
The data used in this project is collected from the GitHub Website.This dataset consists of 5124 data on coffee review, and it consists of name, rating, roaster, regions, type, location, origin, roast, aroma, acid, body, flavour, aftertaste, and review.Figure 5 below shows the data file imported and displays some of the data.Figure 8 Data Split e) Recommendation Algorithm Application Development Recommender systems, a form of machine learning algorithms, are widely used to provide relevant recommendations to users.Whether in apps or search engines, these systems employ a class of algorithms to determine appropriate suggestions for users based on their preferences and interests.Content-based filtering is a popular method within recommender systems, which focuses on the characteristics or content of items that users enjoy.By utilizing user-provided data and available information, the system creates tailored recommendations for individuals.The underlying concept of content-based filtering involves categorizing products using specific keywords (keyword extraction), identifying user preferences, finding products with high similarities (via TF-IDF and Cosine Similarities), and presenting them as recommendations to the user.Keyword extraction is a natural language processing (NLP) technique used to identify and extract important words or phrases from a text.These important words, known as keywords, are typically the most relevant and informative terms that represent the main themes or topics in the given text.

Figure 9 Import Libraries on Keyword Extraction
Figure 9 shows the import libraries before starting the code of the algorithm.Here, the tokenizer was used to divide the text into individual units called tokens.The tokens can be words, punctuation marks, or other meaningful elements.Furthermore, the PorterStemmer was also used for the stemming process which reduces words to their base or root form.

Figure 10 Keyword Extraction Function
Figure 10 above shows the process of a function called extract_keywords that performs keyword extraction and counting for coffee reviews stored in the Data Frame.The function takes a review as input, tokenizes the text into individual words (tokens), and converts them to lowercase to ensure consistency.Next, it utilizes the PorterStemmer from the NLTK library to perform stemming on each token.Stemming reduces words to their base form, which helps to group related words together and simplify analysis.The extracted keywords are then stored in a new column called keywords.This process enables the identification of important words in the coffee review column.The two matrices, Term Frequency (TF) and Inverse Document Frequency (IDF) are closely linked and are valuable in assessing the relevance of a word to a document within a larger corpus of text.The TF represents how often a specific word appears in each document, indicating its prevalence and importance in that context.It measures the ratio of the word's occurrence rate to other terms in the same document.The higher the TF value, the more significant the word is in understanding the document's content.On the other hand, IDF evaluates the rarity of a word across the entire corpus.Together, TF-IDF provides a way to identify words that are crucial for comprehending the document's content within the broader context of the entire dataset.Here, raw documents are converted to a matrix of TF-IDF characteristics using TfidfVectorizer.The pandas, TruncatedSVD and PCA, StandardScaler, Pairwise Distance and TfidfVectorizer libraries have been imported.After importing the libraries, the CSV data was applied as the data set.To check if the data has been successfully read or not, the data.head()function is applied to display the content in the data sets.f) Web Development Website development encompasses all aspects of creating a website, including markup, coding, scripting, and network settings.In this project, the Flask framework is utilized, which is a compact and lightweight Python web framework.Flask offers helpful tools and capabilities, making it easier for developers to build online applications.It provides flexibility and simplicity by enabling the creation of web applications with just one Python file, without the need for a specific directory structure or lengthy boilerplate code.User interface design is the process of creating an interface with the intention of enhancing usability and user experience.User interface design tries to make user interaction as efficient and simple as feasible to help users accomplish their goals.A user interface is necessary for user interaction in any program.Because of the application's user-friendly interface and good design, the user may utilize it with ease. Figure 13 displays the interface layout for the Main Menu Page.On top of the screen, there are four navigation buttons.The user can click on the About button to read a little information on Arabica coffee beans such as the origin and the description of taste of it in general.In addition, the user was directed to the recommender page upon clicking the "Let's Get Started" hyperlink from the home page.

Figure 13 Main Menu Page
The About page is shown in Figure 14.The user can read the information on the website and Arabica coffee beans on this page.The user was directed to the Main Menu page again upon clicking on the home button at the top of the page.Furthermore, the user was immediately directed to the Recommender Page upon clicking on the Recommender Button.Figure 15 shows the coffee recommender input page.The interface lets the user fill in the flavour that the user prefers.For example, the user likes coffee with honey, pear, tangerine zest, dark chocolate, and pistachio taste in their coffee beans.The user will be directed to the Results Page upon clicking on the Search button.

Figure 16 Search Results Page
Figure 16 shows the results of user input keywords.On this page, there were thirty results on the list of recommended coffee based on the highest keyword counts in the review column.The information on each coffee includes the coffee name, roast level, roaster, rating, and review.The keywords entered by the user will appear on the top left and the user can easily enter different flavors and click on the search button without having to go back to the recommender page earlier.The name of each coffee can be copied by the user by clicking on the coffee name itself.Furthermore, the user will be directed to a coffee recommender based on similarities for specified coffee beans upon clicking the "Get recommendation for your selected coffee based on similarity" hyperlink from the result page.

Figure 17
Figure 17 shows the recommendation page that requires users to enter the coffee name and other queries such as the number of nearest coffees so they can get the top recommendation from the entered number and recommendation method.There are three recommendation methods which are based on ratings, roast, type, description and review and combinations of both.The combination method was based on overall features and text.Furthermore, the user will be directed to the recommendation result page upon clicking on the "Get Recommendation" hyperlink on the page.

Figure 18 Recommendation Results Page
Figure 18 shows the recommendation result page which provides you with new coffee beans that are most similar within the number of nearest coffees selected before.The user will be directed to the review of coffee page upon clicking on one of each link provided to be compared.Besides that, the user can also know the index of similarities between those two coffee beans on this page.Furthermore, the user will also be directed to the recommendation page earlier by clicking on the "Back to Recommendations" hyperlink on the page.

Figure 19 Café List Page
Figure 19 shows the café list page.This page provides a list of cafés that serve speciality coffee in those selected areas which are Bandar Baru Bangi, Kajang and Seri Serdang.The user could go to the website, the social media, and the map location of each cafe by clicking on the button at the bottom of every cafe's details.The system can connect the model to external data sources such as text files, databases, spreadsheets, and external programs using interface functions.Figure 20 shows the recommendation based on keyword extraction function.This function will retrieve the user keyword input which is the flavour of coffee.The system will then process the keyword and count the keyword.The top thirty coffee lists will then be stored in coffee_list and will be displayed on the search result page called search.html Figure 21 shows the recommendation based on the coffee name page.This function will retrieve the user's preferences from the submitted form, such as the coffee name, recommendation method, number of recommendations, and whether to pick the best ones.The most similar coffee will then be displayed on the recommended result page on the results.htmlpage.

Results and Discussions
The coffee recommender system is being tested in three types of testing: a) System Testing System testing is a crucial step in software development that verifies the fully integrated and finished software system, ensuring it meets all requirements.The testing process involves exercising the entire computer-based system, as the software is just a part of a larger system that interfaces with other software and hardware components.Functional testing, a type of software testing, evaluates the accuracy of the system's functional requirements or use cases by testing user input, configuration settings, and system data.Accuracy is measured by correctly distinguishing between patients and healthy instances and assessing true positives and true negatives in all analyzed cases.The test cases are aligned with the use-case requirements, and the testing environment and necessary tools are prepared for evaluating each unit of the application or prototype.All entries are ready for testing.b) Functionality Testing Functionality testing is a method of testing a web application system.To accomplish the goals of this project, this phase is crucial.The system's functionality has been checked to make sure it functions correctly.The results are expected results of each functionality test case are fulfilled.c) Accuracy Testing In this project, the accuracy of the recommendation system is evaluated using similarity scores, with cosine similarity being the chosen metric.Cosine similarity is a mathematical concept used to measure the similarity between two vectors in an inner product space.It determines the cosine of the angle between the two vectors, providing a measure of how closely they align or point in the same direction.By employing cosine similarity, the recommendation system can effectively assess the likeness between different items, such as coffee products in this case, based on their respective feature representations.The similarity scores generated by the cosine similarity metric enable the system to identify items that share common characteristics or attributes, helping to recommend products that align with the user's preferences or interests.
Figure 22 Cosine Similarity Index for Colombia Tolima Jairo Beans Figure 22 shows the variable similarity representing the cosine similarity scores between the coffee name entered by the user and the recommended coffee beans by the system.The similarity was corresponding to the coffee name and their roast, rating, typing, and reviewing.Higher similarity values Higher similarity values indicate users with more similar preferences while lower values represent different features with the coffee name entered.

Conclusion
In conclusion, the primary objective of this project is to create a coffee bean recommendation system.The prototype has the capability to suggest a list of coffee beans based on the user's flavour preferences and recommend new coffee beans that share similarities with the coffee name provided by the user.The filtering method involves keyword extraction, calculating cosine similarity and applying a content-based algorithm to filter the dataset effectively.The model accurately recommends coffee beans using the attributes entered by the user.This project is particularly valuable for users seeking brand-new coffee beans with similar descriptions to their favourite ones or for comparing multiple coffee beans with the same flavour profile.It simplifies the process of finding the most suitable coffee beans for individual preferences.Additionally, this project can serve as a valuable research and development tool within the coffee community.One potential improvement for this project is to enhance the recommendation system by suggesting multiple coffee beans instead of just one that shares the same description as the user-entered coffee name.By expanding the range of recommended coffee beans, the overall performance and usefulness of the model can be enhanced.Integrating a more diverse set of coffee varieties into the datasets allows the algorithm to learn about a broader range of coffee attributes and features.Consequently, the model may become more comprehensive and accurate in recommending different types of coffee beans with similar traits, providing users with a richer and more diverse selection of coffee options.Further development can be made to propose cafés instead by narrowing down the availability of preferred coffee beans at specific cafes.The recommendation method can also be fine-tuned by exploring the expert system approach in recommendation where the expertise of a coffee expert(s) can be captured and translated it an algorithm, which can be integrated along with the recommendation algorithm.

Figure 1
Figure 1 System Development Life Cycle (SDLC) of phases in Modified Waterfall Model

Figure 3
Figure 3 Use Case Diagram of the System Figure 4 Content-Based Filtering Design

Figure 5
Figure 5 Data of Coffee Review Figure6below shows the process of data cleaning for the keyword extraction system.The dataset was cleaned by dropping some of the unused data such as regions, slug, aroma, acid, body, and flavour.Furthermore, all the description columns were combined into one column labelled review.

Figure 6
Figure 6 Drop and Combine Data Process

Figure 7
Figure 7 Cleaned Text Using Stop Words Function

Figure 11
Figure 11 Import Libraries on TF-IDF and Cosine Similarities

Figure 12
Figure 12 Applying TF-IDF and Cosine Similarity

Figure 14
Figure 14 About Page

Figure 20
Figure 20 Pass Data from Coffee Recommender Based on Flavor Page

Figure 21
Figure 21 Pass Data from Coffee Recommender based on Coffee Name Page