mahout0.8 构建推荐图书系统(dataguru mahout 第二周作业)


书面作业 
1. 用Maven搭建Mahout的开发环境,并完成PPT 26页,最简单的例子。要求有过程说明和截图。

1.1开发环境

– Win7 64bit

– Java 1.7.0_51

– Maven-3.2.1

–myEclipse2013 SR

– Mahout-0.8

–        Hadoop-2.2.0

1.2 用Maven构建Mahout开发环境

1.2.1 用Maven创建一个标准化的Java项目

D:\MyEclipse Professional\java>cd D:\MyEclipse Professional\myMahout

 

D:\MyEclipse Professional\myMahout>mvn archetype:generate-DarchetypeGroupId=org

.apache.maven.archetypes -DgroupId=org.conan.mymahout-DartifactId=myMahout -Dpa

ckageName=org.conan.mymahout -Dversion=1.0-SNAPSHOT-DinteractiveMode=false

[INFO] Scanning for projects...

[INFO]

[INFO] Using the builderorg.apache.maven.lifecycle.internal.builder.singlethrea

ded.SingleThreadedBuilder with a thread count of 1

[INFO]

[INFO]------------------------------------------------------------------------

[INFO] Building Maven Stub Project (No POM) 1

[INFO]------------------------------------------------------------------------

[INFO]

[INFO] >>> maven-archetype-plugin:2.2:generate(default-cli) @ standalone-pom >>

[INFO]

[INFO] <<< maven-archetype-plugin:2.2:generate(default-cli) @ standalone-pom <<

[INFO]

[INFO] --- maven-archetype-plugin:2.2:generate (default-cli) @standalone-pom --

-

[INFO] Generating project in Batch mode

[INFO] No archetype defined. Using maven-archetype-quickstart(org.apache.maven.

archetypes:maven-archetype-quickstart:1.0)

[INFO]-------------------------------------------------------------------------

---

[INFO] Using following parameters for creating project from Old(1.x) Archetype:

 maven-archetype-quickstart:1.0

[INFO]-------------------------------------------------------------------------

---

[INFO] Parameter: groupId, Value: org.conan.mymahout

[INFO] Parameter: packageName, Value: org.conan.mymahout

[INFO] Parameter: package, Value: org.conan.mymahout

[INFO] Parameter: artifactId, Value: myMahout

[INFO] Parameter: basedir, Value: D:\MyEclipseProfessional\myMahout

[INFO] Parameter: version, Value: 1.0-SNAPSHOT

[INFO] project created from Old (1.x) Archetype in dir:D:\MyEclipse Professiona

l\myMahout\myMahout

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

[INFO]------------------------------------------------------------------------

[INFO] Total time: 02:29 min

[INFO] Finished at: 2014-03-10T21:12:36+08:00

[INFO] Final Memory: 16M/108M

[INFO]------------------------------------------------------------------------

1.2.3 导入项目到eclipse


1.2.4 增加mahout依赖,修改pom.xml

<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/maven-v4_0_0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <groupId>org.conan.mymahout</groupId>

    <artifactId>myMahout</artifactId>

    <packaging>jar</packaging>

    <version>1.0-SNAPSHOT</version>

    <name>myMahout</name>

    <url>http://maven.apache.org</url>

    <properties>

        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

        <mahout.version>0.8</mahout.version>

    </properties>

 

    <dependencies>

        <dependency>

            <groupId>org.apache.mahout</groupId>

            <artifactId>mahout-core</artifactId>

            <version>${mahout.version}</version>

        </dependency>

        <dependency>

            <groupId>org.apache.mahout</groupId>

            <artifactId>mahout-integration</artifactId>

            <version>${mahout.version}</version>

            <exclusions>

                <exclusion>

                    <groupId>org.mortbay.jetty</groupId>

                    <artifactId>jetty</artifactId>

                </exclusion>

                <exclusion>

                    <groupId>org.apache.cassandra</groupId>

                    <artifactId>cassandra-all</artifactId>

                </exclusion>

                <exclusion>

                    <groupId>me.prettyprint</groupId>

                    <artifactId>hector-core</artifactId>

                </exclusion>

            </exclusions>

        </dependency>

    </dependencies>

</project>

 

1.2.4 下载依赖

D:\MyEclipse Professional\myMahout\myMahout>mvn clean install

[INFO] Scanning for projects...

[INFO]

[INFO] Using the builderorg.apache.maven.lifecycle.internal.builder.singlethrea

ded.SingleThreadedBuilder with a thread count of 1

[INFO]

[INFO]------------------------------------------------------------------------

[INFO] Building myMahout 1.0-SNAPSHOT

[INFO]------------------------------------------------------------------------

[INFO]

[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ myMahout---

[INFO]

[INFO] --- maven-resources-plugin:2.6:resources (default-resources)@ myMahout -

--

[INFO] Using 'UTF-8' encoding to copy filtered resources.

[INFO] skip non existing resourceDirectory D:\MyEclipseProfessional\myMahout\my

Mahout\src\main\resources

[INFO]

[INFO] --- maven-compiler-plugin:2.5.1:compile(default-compile) @ myMahout ---

[INFO] Compiling 1 source file to D:\MyEclipseProfessional\myMahout\myMahout\ta

rget\classes

[INFO]

[INFO] --- maven-resources-plugin:2.6:testResources(default-testResources) @ my

Mahout ---

[INFO] Using 'UTF-8' encoding to copy filtered resources.

[INFO] skip non existing resourceDirectory D:\MyEclipseProfessional\myMahout\my

Mahout\src\test\resources

[INFO]

[INFO] --- maven-compiler-plugin:2.5.1:testCompile(default-testCompile) @ myMah

out ---

[INFO] Compiling 1 source file to D:\MyEclipseProfessional\myMahout\myMahout\ta

rget\test-classes

[INFO]

[INFO] --- maven-surefire-plugin:2.12.4:test(default-test) @ myMahout ---

[INFO] Surefire report directory: D:\MyEclipseProfessional\myMahout\myMahout\ta

rget\surefire-reports

Downloading:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/suref

ire-junit4/2.12.4/surefire-junit4-2.12.4.pom

Downloaded:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefi

re-junit4/2.12.4/surefire-junit4-2.12.4.pom(3 KB at 0.5 KB/sec)

Downloading:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/suref

ire-providers/2.12.4/surefire-providers-2.12.4.pom

Downloaded:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefi

re-providers/2.12.4/surefire-providers-2.12.4.pom(3 KB at 3.1 KB/sec)

Downloading:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/suref

ire-junit4/2.12.4/surefire-junit4-2.12.4.jar

Downloaded:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefi

re-junit4/2.12.4/surefire-junit4-2.12.4.jar(37 KB at 16.2 KB/sec)

 

-------------------------------------------------------

 T E S T S

-------------------------------------------------------

Running org.conan.mymahout.AppTest

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:0.007 sec

 

Results :

 

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

 

[INFO]

[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ myMahout ---

[INFO] Building jar: D:\MyEclipseProfessional\myMahout\myMahout\target\myMahout

-1.0-SNAPSHOT.jar

[INFO]

[INFO] --- maven-install-plugin:2.4:install (default-install) @myMahout ---

[INFO] Installing D:\MyEclipseProfessional\myMahout\myMahout\target\myMahout-1.

0-SNAPSHOT.jar toC:\Users\Administrator\.m2\repository\org\conan\mymahout\myMah

out\1.0-SNAPSHOT\myMahout-1.0-SNAPSHOT.jar

[INFO] Installing D:\MyEclipseProfessional\myMahout\myMahout\pom.xml to C:\User

s\Administrator\.m2\repository\org\conan\mymahout\myMahout\1.0-SNAPSHOT\myMahout

-1.0-SNAPSHOT.pom

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

[INFO]------------------------------------------------------------------------

[INFO] Total time: 13.173 s

[INFO] Finished at: 2014-03-10T21:28:56+08:00

[INFO] Final Memory: 24M/178M

[INFO]------------------------------------------------------------------------

D:\MyEclipse Professional\myMahout\myMahout>

在eclipse中刷新项目:


1.3 用Mahout实现协同过滤userCF


2. 用案例的数据集,基于Mahout,任选一种算法,对任意一个女性用户进行协同过滤推荐,并解释推荐结果是否合理,解释过程可以写成一文档说明。

控制台输出:只截取部分结果:

userEuclidean       =>uid:163,(279,5.500000)

itemEuclidean      =>uid:163,(374,9.454545)(264,9.000000)(852,8.927536)

userEuclideanNoPref=>uid:163,(279,2.000000)(2,1.000000)(415,1.000000)

itemEuclideanNoPref=>uid:163,(138,5.150000)(246,4.092857)(288,3.833333)我们查看uid=163的用户推荐信息:推荐了138。然后我们看看图书138评分比较高的都有哪些用户:

userid

bookid

score

sex

age

152

138

8

F

26

172

138

4

F

56

 

其中152用户对973图书的评分很高。

userid

bookid

score

sex

age

152

973

8

F

26

163

973

9

F

32

所以是合理的。

 


3. 接第2题,增加过滤条件,排除男性,只保留对女性用户的推荐评分,然后进行推荐,并解释推荐结果,是否合理。要求有代码,运行过程抓图,代码的文档说明,解释结果的文档说明等。

package org.conan.mymahout.recommendation.book;

 

import java.io.BufferedReader;

import java.io.File;

import java.io.FileReader;

import java.io.IOException;

import java.util.HashSet;

import java.util.List;

import java.util.Set;

 

import org.apache.mahout.cf.taste.common.TasteException;

import org.apache.mahout.cf.taste.eval.RecommenderBuilder;

importorg.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;

import org.apache.mahout.cf.taste.model.DataModel;

import org.apache.mahout.cf.taste.recommender.IDRescorer;

import org.apache.mahout.cf.taste.recommender.RecommendedItem;

 

public class BookFilterGenderResult {

 

    final static intNEIGHBORHOOD_NUM = 2;

    final static intRECOMMENDER_NUM = 3;

 

    public static void main(String[]args) throws TasteException, IOException {

        String file ="datafile/book/rating.csv";

        DataModel dataModel= RecommendFactory.buildDataModel(file);

        RecommenderBuilderrb1 = BookEvaluator.userEuclidean(dataModel);

        RecommenderBuilder rb2 =BookEvaluator.itemEuclidean(dataModel);

        RecommenderBuilderrb3 = BookEvaluator.userEuclideanNoPref(dataModel);

        RecommenderBuilderrb4 = BookEvaluator.itemEuclideanNoPref(dataModel);

       

        long uid = 152;

       System.out.print("userEuclidean       =>");

        filterGender(uid,rb1, dataModel);

       System.out.print("itemEuclidean       =>");

        filterGender(uid,rb2, dataModel);

       System.out.print("userEuclideanNoPref =>");

        filterGender(uid,rb3, dataModel);

       System.out.print("itemEuclideanNoPref =>");

        filterGender(uid,rb4, dataModel);

    }

 

    /**

     * 对用户性别进行过滤

     */

    public static voidfilterGender(long uid, RecommenderBuilder recommenderBuilder, DataModeldataModel) throws TasteException, IOException {

        //Set<Long>userids = getMale("datafile/book/user.csv");

    Set <Long>userids = getFeMale("datafile/book/user.csv");

 

        //计算女性用户打分过的图书

        Set bookids = newHashSet();

        for (long uids :userids) {

           LongPrimitiveIterator iter =dataModel.getItemIDsFromUser(uids).iterator();

            while(iter.hasNext()) {

                long bookid = iter.next();

               bookids.add(bookid);

            }

        }

 

        IDRescorer rescorer= new FilterRescorer(bookids);

        List list =recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM,rescorer);

       RecommendFactory.showItems(uid, list, false);

    }

 

    /**

     * 获得男性用户ID

     */

    public static SetgetMale(String file) throws IOException {

        BufferedReader br =new BufferedReader(new FileReader(new File(file)));

        Set userids = newHashSet();

        String s = null;

        while ((s =br.readLine()) != null) {

            String[] cols =s.split(",");

            if(cols[1].equals("M")) {// 判断男性用户

               userids.add(Long.parseLong(cols[0]));

            }

        }

        br.close();

        return userids;

    }

    /**

     * 获得女性用户ID

     */

    public static SetgetFeMale(String file) throws IOException {

        BufferedReader br =new BufferedReader(new FileReader(new File(file)));

        Set userids = newHashSet();

        String s = null;

        while ((s =br.readLine()) != null) {

            String[] cols =s.split(",");

            if(cols[1].equals("F")) {// 判断女性用户

               userids.add(Long.parseLong(cols[0]));

            }

        }

        br.close();

        return userids;

    }

}

 

/**

 * 对结果重计算

 */

class FilterRescorer implements IDRescorer {

    final private Setuserids;

 

    publicFilterRescorer(Set userids) {

        this.userids =userids;

    }

 

    @Override

    public doublerescore(long id, double originalScore) {

        returnisFiltered(id) ? Double.NaN : originalScore;

    }

 

    @Override

    public booleanisFiltered(long id) {

        return !userids.contains(id);

    }

}

运行结果:

userEuclidean

AVERAGE_ABSOLUTE_DIFFERENCEEvaluater Score:0.11111108462015788

RecommenderIR Evaluator: [Precision:0.3010752688172043,Recall:0.08542713567839195]

itemEuclidean

AVERAGE_ABSOLUTE_DIFFERENCEEvaluater Score:1.3536954060693203

RecommenderIR Evaluator: [Precision:0.0,Recall:0.0]

userEuclideanNoPref

AVERAGE_ABSOLUTE_DIFFERENCEEvaluater Score:4.61812258478421

RecommenderIR Evaluator: [Precision:0.09045226130653267,Recall:0.09296482412060306]

itemEuclideanNoPref

AVERAGE_ABSOLUTE_DIFFERENCEEvaluater Score:2.625455679766278

RecommenderIR Evaluator: [Precision:0.6005025125628134,Recall:0.6055276381909548]

userEuclidean       =>uid:99,

itemEuclidean      =>uid:99,(586,10.000000)(378,10.000000)(202,9.666667)

userEuclideanNoPref=>uid:99,(616,1.000000)(307,1.000000)(552,1.000000)

itemEuclideanNoPref=>uid:99,(96,3.392724)(860,3.250000)(375,3.200000)

我们对itemEuclideanNoPref算法的结果进行分析。

排名第一的是ID为96的图书,我再一步向下追踪:查询哪些用户对图书96的打分比较高。

73

96

8

F

28

79

96

7

F

32

117

96

10

F

34

163

96

8

F

32

所有得用户都是女性,其中117用户对106图书的评分很高。

userid

bookid

score

sex

age

99

106

10

F

37

117

106

7

F

34

所以是合理的。

 

相关内容