Skip navigation.
Home

Blogs

Lecture [video]: Mismatched Models, Wrong Results, and Dreadful Decisions

Great lecture; a must watch:

http://videolectures.net/kdd09_hand_mmwrdd/

Mismatched Models, Wrong Results, and Dreadful Decisions

author: David J. Hand, Department of Mathematics, Imperial College London

Description

Data mining techniques use score functions to quantify how well a model fits a given data set. Parameters are estimated by optimising the fit, as measured by the chosen score function, and model choice is guided by the size of the scores for the different models. Since different score functions summarise the fit in different ways, it is important to choose a function which matches the objectives of the data mining exercise. For predictive classification problems, a wide variety of score functions exist, including measures such as precision and recall, the F measure, misclassification rate, the area under the ROC curve (the AUC), and others. The first four of these require a classification threshold to be chosen, a choice which may not be easy, or may even be impossible, especially when the classification rule is to be applied in the future. In contrast, the AUC does not require the specification of a classification threshold, but summarises performance over the range of possible threshold choices. However, unfortunately, and despite the widespread use of the AUC, it has a previously unrecognised fundamental incoherence lying at the core of its definition. This means that using the AUC can lead to poor model choice and unecessary misclassifications. The AUC is set in context, its deficiency explained and the implications illustrated - with the bottom line being that the AUC should not be used. A family of coherent alternative scores is described. The ideas are illustrated with examples from bank loans, fraud, face recognition, and health screening.

 

iPhone iPod delete all music videos applications etc.

iPhone iPod delete all music videos applications etc.

To accomplish the above goal.  Go to iTunes; and then to the tab e.g. "music" from which you want to delete everything and un-check "Sync .." this should get rid of it.

 

 

Bibtex Texshop

I always run into this trouble when I compile latex files.  Here is the solution / reminder
P.S. dont forget to specify \bibliographystyle

http://forums.macnn.com/82/applications/88947/help-with-bibtex-and-texshop/

1. Put the citations in your .tex file in the form
\cite{<key>}. Papers you have not cited will not appear
in the bibliography.



2. Put \bibliography{<bibfilename>} in your .tex file
where you want the bibliography to appear. Make sure the .bib file is
somewhere that latex can find it, such as the same folder as the .tex
file.



3. Run latex then bibtex then latex then latex again, all on your .tex
file (actually you run bibtex on the .aux file, but texshop does this
for you).



Check the results.



For more details, see appendix B of the latex book, or chapter 13 of the latex companion.

 

Data Visualization

 

Books:

The Grammar of Graphics, Leland Wilkinson
Visualizing Data, William S. Cleveland
The Visual Display of Quantitative Information, Edward Tufte
Information Visualization: Perception for Design, Colin Ware
Show Me the Numbers: Designing Tables and Graphs to Enlighten, Stephen Few

 

Tools

Tableau (pros: one of the best data exploration tools, free for open data; cons: somewhat costly ~$1,700)
Pentaho Reporting
Pivot (Microsoft) http://www.getpivot.com
Many-Eyes    http://many-eyes.com
Verfiable        http://verifiable.com
TimeSearcher    http://www.cs.umd.edu/hcil/timesearcher
Parvis        http://home.subnet.at/flo/mv/parvis
Improvise        http://www.cs.ou.edu/~weaver/improvise/

GGobi        http://ggobi.org
Interactive, brushing, etc.

1d density plot

parallel coordinates plot
 
GGPlot2 (in R)    http://had.co.nz/ggplot2/

theonion clickthrus broken down by date and day of the week
 

Tree Network Tools

GraphViz        http://www.graphviz.org
NodeXL        http://www.codeplex.com/NodeXL
GUESS        http://graphexploration.cond.org/
Pajek        http://pajek.imfm.si/doku.php
TreeMap         http://www.cs.umd.edu/hcil/treemap
Workbench    http://nwb.slis.indiana.edu/
 

Programming Tools

processing.org        A popular graphics language
protovis.org        Visualization tools for JavaScript
flare.prefuse.org    Visualization tools for Flash
prefuse.org        Visualization tools for Java
modestmaps.com    Mapping tools for Flash/JavaScript

People / Blogs

Andrew (bloger)
Nathan (bloger)
Jeffrey Heer (visualization librariries)
Katy Borner (visualization of science)
 

Reference

Some of the recommendations are by Jefferey Heer (an expert in the area) given at the MediaX 2009 workshop

Various

smoothScatter produces a smoothed color density representation of the scatterplot, obtained through a kernel density estimate.

 

Japan Mobile SNS presentation

You cannot install numpy on this volume. numpy requires System Python to install os x

Problem


You cannot install numpy on this volume. numpy requires System Python  to install os x

Solution

Numpy has several files depending on your version of python e.g. 2.5, 2.6.  Make sure you download the right one.

 

gwt google app engine gae structure client server package

For more information see the following guide kindly provided by Google. Here is the partial copy:

 

Standard Directory and Package Layout

GWT projects are overlaid onto Java packages such that most of the configuration can be inferred from the classpath and the module definitions.

Guidelines

If you are not using the Command-line tools to generate your project files and directories, here are some guidelines to keep in mind when organizing your code and creating Java packages.

  1. Under the main project directory create the following directories:
    • src folder - contains production Java source
    • war folder - your web app; contains static resources as well as compiled output
    • test folder - (optional) JUnit test code would go here
  2. Within the src package, create a project root package and a client package.
  3. If you have server-side code, also create a server package to differentiate between the client-side code (which is translated into JavaScript) from the server-side code (which is not).
  4. Within the project root package, place one or more module definitions.
  5. In the war directory, place any static resources (such as the host page, style sheets, or images).
  6. Within the client and server packages, you are free to organize your code into any subpackages you require.

Example: GWT standard package layout

For example, all the files for the "DynaTable" sample are organized in a main project directory also called "DynaTable".

  • Java source files are in the directory: DynaTable/src/com/google/gwt/sample/dynatable
  • The module is defined in the XML file: DynaTable/src/com/google/gwt/sample/dynatable/DynaTable.gwt.xml
  • The project root package is: com.google.gwt.sample.dynatable
  • The logical module name is: com.google.gwt.sample.dynatable.DynaTable

The src directory

The src directory contains an application's Java source files, the module definition, and external resource files.

Package File Purpose
com.google.gwt.sample.dynatable  The project root package contains module XML files.
com.google.gwt.sample.dynatable DynaTable.gwt.xml Your application module. Inherits com.google.gwt.user.User and adds an entry point class, com.google.gwt.sample.dynatable.client.DynaTable.
com.google.gwt.sample.dynatable  Static resources that are loaded programmatically by GWT code. Files in the public directory are copied into the same directory as the GWT compiler output.
com.google.gwt.sample.dynatable logo.gif An image file available to the application code. You might load this file programmatically using this URL: GWT.getModuleBaseURL() + "logo.gif".
com.google.gwt.sample.dynatable.client  Client-side source files and subpackages.
com.google.gwt.sample.dynatable.client DynaTable.java Client-side Java source for the entry-point class.
com.google.gwt.sample.dynatable.client SchoolCalendarService.java An RPC service interface.
com.google.gwt.sample.dynatable.server  Server-side code and subpackages.
com.google.gwt.sample.dynatable.server SchoolCalendarServiceImpl.java Server-side Java source that implements the logic of the service.

The war directory

The war directory is the deployment image of your web application. It is in the standard expanded war format recognized by a variety of Java web servers, including Tomcat, Jetty, and other J2EE servlet containers. It contains a variety of resources:

  • Static content you provide, such as the host HTML page
  • GWT compiled output
  • Java class files and jar files for server-side code
  • A web.xml file that configures your web app and any servlets

A detailed description of the war format is beyond the scope of this document, but here are the basic pieces you will want to know about:

Directory File Purpose
DynaTable/war/ DynaTable.html A host HTML page that loads the DynaTable app.
DynaTable/war/ DynaTable.css A static style sheet that styles the DynaTable app.
DynaTable/www/dynatable/  The DynaTable module directory where the GWT compiler writes output and files on the public path are copied. NOTE: by default this directory would be the long, fully-qualified module name com.google.gwt.sample.dynatable.DynaTable. However, in our GWT module XML file we used the rename-to="dynatable" attribute to shorten it to a nice name.
DynaTable/www/dynatable/ dynatable.nocache.js The "selection script" for DynaTable. This is the script that must be loaded from the host HTMLto load the GWT module into the page.
DynaTable/war/WEB-INF  All non-public resources live here, see the servlet specification for more detail.
DynaTable/war/WEB-INF web.xml Configures your web app and any servlets.
DynaTable/war/WEB-INF/classes  Java compiled class files live here to implement server side functionality. If you're using an IDE set the output directory to this folder.
DynaTable/war/WEB-INF/lib  Any library dependencies your server code needs goes here.
DynaTable/war/WEB-INF/lib gwt-servlet.jar If you have any servlets using GWT RPC, you will need to place a copy of gwt-servlet.jar here.

The test directory

The test directory contains the source files for any JUnit tests.

Package File Purpose
com.google.gwt.sample.dynatable.client  Client-side test files and subpackages.
com.google.gwt.sample.dynatable.client DynaTableTest.java Test cases for the entry-point class.
com.google.gwt.sample.dynatable.server  Server-side test files and subpackages.
com.google.gwt.sample.dynatable.server SchoolCalendarServiceImplTest.java Test cases for server classes.

 

 

XML GWT AJAX XML JSON Google App Engine

ProblemCross-site RPC seemed to work with JSON but not with XMLSolutionStrip out white characters (including new lines)KeywordsXML GWT AJAX XML JSON Google App Engined python cross site web service 

Google App Engine: Running Python and Java side by side

Task

Want to run both java and python by using the same application
(Note: this really only makes sense if you want to use common services such as datastore, memcache, queue, etc.; if not just deploy them as separate applications (doubles your quota) and communicate between them by using web services).

Solution

You can simply deploy them to different versions.  Note versions don't have to be numeric.  You can deploy your java code to version "java" and the corresponding url will be http://java.latest.YourApp.appspot.com ; and deploy your python to http://py.latest.YourApp.appspot.com by using version "py"

You can let java and python versions communicate between each other by using JSON (more precisely JSONP [for cross site requests]) http://code.google.com/webtoolkit/tutorials/1.6/Xsite.html

Using GWT also makes this job somewhat easier

 

Keywords

gae app same both java python simultaneously java and python both java and python app id appid together google app engine gwt

 

GAE GWT OS X

Problem

WARNING: Failed startup of context com.google.apphosting.utils.jetty.DevAppEngineWebAppContext java.util.zip.ZipException: error in opening zip file
HTTP ERROR: 503

SERVICE_UNAVAILABLE
RequestURI=/T1.html

Powered by jetty://
http://localhost:8080/T1.html
Jul 3, 2009 10:58:46 AM com.google.apphosting.utils.jetty.JettyLogger warn
WARNING: Failed startup of context com.google.apphosting.utils.jetty.DevAppEngineWebAppContext@32df24{/,/Volumes/TRASCEND/docs/neil/Research/GroupFormation/code/GAE/t1/war}
java.util.zip.ZipException: error in opening zip file
	at java.util.zip.ZipFile.open(Native Method)
	at java.util.zip.ZipFile.<init>(ZipFile.java:203)
	at java.util.jar.JarFile.<init>(JarFile.java:132)
	at java.util.jar.JarFile.<init>(JarFile.java:97)
	at org.mortbay.jetty.webapp.TagLibConfiguration.configureWebApp(TagLibConfiguration.java:171)
	at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1215)
	at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
	at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
	at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
	at org.mortbay.jetty.Server.doStart(Server.java:217)
	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
	at com.google.appengine.tools.development.JettyContainerService.startContainer(JettyContainerService.java:147)
	at com.google.appengine.tools.development.AbstractContainerService.startup(AbstractContainerService.java:116)
	at com.google.appengine.tools.development.DevAppServerImpl.start(DevAppServerImpl.java:211)
	at com.google.appengine.tools.development.gwt.AppEngineLauncher.start(AppEngineLauncher.java:86)
	at com.google.gwt.dev.HostedMode.doStartUpServer(HostedMode.java:365)
	at com.google.gwt.dev.HostedModeBase.startUp(HostedModeBase.java:590)
	at com.google.gwt.dev.HostedModeBase.run(HostedModeBase.java:397)
	at com.google.gwt.dev.HostedMode.main(HostedMode.java:232)
The server is running at http://localhost:8080/
2009-07-03 19:58:46.975 java[3373:80f] [Java CocoaComponent compatibility mode]: Enabled
2009-07-03 19:58:46.976 java[3373:80f] [Java CocoaComponent compatibility mode]: Setting timeout for SWT to 0.100000
SCFinderPlugin(114): Unable to get bundle identifier.SCFinderPlugin(114): Unable to get bundle identifier.SCFinderPlugin(114): Unable to get bundle identifier.

Solution

It seems to be caused by "._" (dot underscore) files created by OSX when a non osx partition is used
I have created the project on the mac partition and to my surprise it fixed the problem
(so much for paying premium for an increased productivity on mac)

 

Syndicate content