Apache Mahout

Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative filtering, clustering and classification. Many of the implementations use the Apache Hadoop platform. Mahout also provides Java libraries for common maths operations (focused on linear algebra and statistics) and primitive Java collections. Mahout is a work in progress; the number of implemented algorithms has grown quickly, but various algorithms are still missing.

Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop® and using the MapReduce paradigm. Machine learning is a discipline of artificial intelligence focused on enabling machines to learn without being explicitly programmed, and it is commonly used to improve future performance based on previous outcomes. The Apache Mahout™ project's goal is to build an environment for quickly creating scalable performant machine learning applications.

Apache Mahout software provides three major features

• A simple and extensible programming environment and framework for building scalable algorithms

• A wide variety of premade algorithms for Scala + Apache Spark, H2O, Apache Flink

• Samsara, a vector math experimentation environment with R-like syntax which works at scale

While Mahout's core algorithms for clustering, classification and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm, it does not restrict contributions to Hadoop-based implementations. Contributions that run on a single node or on a non-Hadoop cluster are also welcomed. For example, the 'Taste' collaborative-filtering recommender component of Mahout was originally a separate project and can run stand-alone without Hadoop.

What Mahout Does

Mahout supports four main data science use cases:

• Collaborative filtering: mines user behavior and makes product recommendations (e.g. Amazon recommendations)

• Clustering: takes items in a particular class (such as web pages or newspaper articles) and organizes them into naturally occurring groups, such that items belonging to the same group are similar to each other

• Classification: learns from existing categorizations and then assigns unclassified items to the best category

• Frequent itemset mining: analyzes items in a group (e.g. items in a shopping cart or terms in a query session) and then identifies which items typically appear together

added 9 years 8 months ago

Contents related to 'Apache Mahout'

Apache Hadoop: Apache Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.