Friday, November 22, 2013

First whisper from Spark

I was attending a talk about Apache Spark: a memory based real-time computing structure. The Spark is an effort from UC Berkeley for data analytic.
Search “The Berkeley Data Analytics Stack Pressent and Future”, you will find a more clear description about the framework. In the framework, Spark serves as the  computation base and on top of it,
  • There is Shark which provides SQL-like environment, comparable project is MapReduce.
  • There is Graph X which is for graph structure computaiton, comparable project is like Gigraph
  • There is streaming process component, which is Spark streaming.
  • There is machine learning toolkit, ML-lib.
In the demonstration, the spark is quite impressive compared to Hive. Especially, after the first run, the data is loaded in the memory and query is quite fast.
One point need to make is that all this doesn’t come as free. Somewhere in the code, you must all cache() method on the object to indicate that you want to keep the object in the memory.

Wednesday, October 30, 2013

How to turn a wireless modem into a modem and use another router

Nowadays, modem are coming with function of router and wireless ability. However, it's crappy by itself and sometime we just want to use it as modem and use router/wireless router our own.

To turn modem with routing and wireless capacity into a simple modem:
1. disable wireless on it.
2. disable DHCP on it.
3. turn the model into rfc1843(this may be wrong) bridge or other model
4. connect wlan port of another router to the modem
5. setup pppe(or other) on router.

Sunday, October 20, 2013

some thinking about Ding Lei's speech in Harvard science center Hall A. Oct 19th, 2013

It's great to walk out to Harvard campus to listen to speech about data scientist stuff.

Not very detailed fact is presented in the slides. Only couple of lines:
* business insights(audience)
* predictive analysis(uplift modeling)
* software engineering(hadoop ecosystem)

He also emphasize on the success of big data/data science project need the support from executives. As we don't know what we want to build as the beginning stage. And only beautiful reports can attract executives' support. And beautiful reports should be good for executives to read.

Also he mentioned he has two stacks in his team: business stack where weeks or months of project which provides feature for business; and research stack can use try and test approach to find the value of data science.

One thing I was not realize it's importance it's we need find the business value of data science.

Saturday, June 22, 2013