IBM Distinguished Engineer solves Big Data Conjecture - Data Science Central

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

IBM Distinguished Engineer solves Big Data Conjecture - Data Science Central

Tom Johnson
http://www.datasciencecentral.com/profiles/blogs/ibm-distinguished-engineer-solves-big-data-conjecture

BM Distinguished Engineer solves Big Data Conjecture

A mathematical problem related to big data was solved by Jean-Francois Puget, engineer in the Solutions Analytics and Optimization group at IBM France. The problem was first mentioned on Data Science Central, and an award was offered to the first data scientist to solve it.

Bryan Gorman, Principal Physicist, Chief Scientist at Johns Hopkins University Applied Physics Laboratory, made a significant breakthrough in July, and won $500. Jean-Francois Puget completely solved the problem, independently from Bryan, and won a $1,000 award.

Example of rare, special permutation investigated to prove the theorem

The competition has been organized and financed by Data Science central. Participants from around the world submitted a number of interesting approaches. The mathematical question was asked by Vincent Granville, a leading data scientist and co-founder at Data Science Central. Granville initially proposed a solution after performing large-scale Monte Carlo simulations, but his solution turned out to be wrong.

The problem consisted in finding an exact formula for a new type of correlation and goodness-of-fit metrics, designed specifically for big data, generalizing the Spearman's rank coefficient, and being especially robust for non-bounded, ordinal data found in large data sets. From a mathematical point of view, the new metric is based on L-1 rather than L-2 theory: In other words, it relies on absolute rather than squared differences. Using squares (or higher powers) is what makes traditional metrics such as R squared notoriously sensitive to outliers, and avoided by savvy statistical modelers. In big data, outliers are plentiful and it can render conclusions from a statistical analysis invalid, so this is a critical issue. This outlier issue is sometimes referred to as the curse of big data.....[more]


-tj




============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
Reply | Threaded
Open this post in threaded view
|

Re: IBM Distinguished Engineer solves Big Data Conjecture - Data Science Central

George Duncan-2
Ah, big data and big computing!

Interesting that this relates to what I did for my masters paper in statistics at the University of Chicago. But I was no where near so sophisticated and of course had no big computing back in 1964!

Off to Aunt Mabel's 100 tomorrow am!

George Duncan
georgeduncanart.com
(505) 983-6895 
Represented by ViVO Contemporary
725 Canyon Road
Santa Fe, NM 87501
 
Dynamic application of matrix order and luminous chaos.


On Thu, Oct 24, 2013 at 5:19 PM, Tom Johnson <[hidden email]> wrote:
http://www.datasciencecentral.com/profiles/blogs/ibm-distinguished-engineer-solves-big-data-conjecture

BM Distinguished Engineer solves Big Data Conjecture

A mathematical problem related to big data was solved by Jean-Francois Puget, engineer in the Solutions Analytics and Optimization group at IBM France. The problem was first mentioned on Data Science Central, and an award was offered to the first data scientist to solve it.

Bryan Gorman, Principal Physicist, Chief Scientist at Johns Hopkins University Applied Physics Laboratory, made a significant breakthrough in July, and won $500. Jean-Francois Puget completely solved the problem, independently from Bryan, and won a $1,000 award.

Example of rare, special permutation investigated to prove the theorem

The competition has been organized and financed by Data Science central. Participants from around the world submitted a number of interesting approaches. The mathematical question was asked by Vincent Granville, a leading data scientist and co-founder at Data Science Central. Granville initially proposed a solution after performing large-scale Monte Carlo simulations, but his solution turned out to be wrong.

The problem consisted in finding an exact formula for a new type of correlation and goodness-of-fit metrics, designed specifically for big data, generalizing the Spearman's rank coefficient, and being especially robust for non-bounded, ordinal data found in large data sets. From a mathematical point of view, the new metric is based on L-1 rather than L-2 theory: In other words, it relies on absolute rather than squared differences. Using squares (or higher powers) is what makes traditional metrics such as R squared notoriously sensitive to outliers, and avoided by savvy statistical modelers. In big data, outliers are plentiful and it can render conclusions from a statistical analysis invalid, so this is a critical issue. This outlier issue is sometimes referred to as the curse of big data.....[more]


-tj




============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com


============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
Reply | Threaded
Open this post in threaded view
|

Re: IBM Distinguished Engineer solves Big Data Conjecture - Data Science Central

Tom Johnson
Well, have fun and give her our regards.  You missed a good S.A.R. lecture tonight, but a not so good dinner at that hotel across the street from the back side of the history museum that is not the Inn of the Anisaze.

Travel safe,
T*D


On Thu, Oct 24, 2013 at 8:35 PM, George Duncan <[hidden email]> wrote:
Ah, big data and big computing!

Interesting that this relates to what I did for my masters paper in statistics at the University of Chicago. But I was no where near so sophisticated and of course had no big computing back in 1964!

Off to Aunt Mabel's 100 tomorrow am!

George Duncan
georgeduncanart.com
<a href="tel:%28505%29%20983-6895" value="+15059836895" target="_blank">(505) 983-6895 
Represented by ViVO Contemporary
725 Canyon Road
Santa Fe, NM 87501
 
Dynamic application of matrix order and luminous chaos.


On Thu, Oct 24, 2013 at 5:19 PM, Tom Johnson <[hidden email]> wrote:
http://www.datasciencecentral.com/profiles/blogs/ibm-distinguished-engineer-solves-big-data-conjecture

BM Distinguished Engineer solves Big Data Conjecture

A mathematical problem related to big data was solved by Jean-Francois Puget, engineer in the Solutions Analytics and Optimization group at IBM France. The problem was first mentioned on Data Science Central, and an award was offered to the first data scientist to solve it.

Bryan Gorman, Principal Physicist, Chief Scientist at Johns Hopkins University Applied Physics Laboratory, made a significant breakthrough in July, and won $500. Jean-Francois Puget completely solved the problem, independently from Bryan, and won a $1,000 award.

Example of rare, special permutation investigated to prove the theorem

The competition has been organized and financed by Data Science central. Participants from around the world submitted a number of interesting approaches. The mathematical question was asked by Vincent Granville, a leading data scientist and co-founder at Data Science Central. Granville initially proposed a solution after performing large-scale Monte Carlo simulations, but his solution turned out to be wrong.

The problem consisted in finding an exact formula for a new type of correlation and goodness-of-fit metrics, designed specifically for big data, generalizing the Spearman's rank coefficient, and being especially robust for non-bounded, ordinal data found in large data sets. From a mathematical point of view, the new metric is based on L-1 rather than L-2 theory: In other words, it relies on absolute rather than squared differences. Using squares (or higher powers) is what makes traditional metrics such as R squared notoriously sensitive to outliers, and avoided by savvy statistical modelers. In big data, outliers are plentiful and it can render conclusions from a statistical analysis invalid, so this is a critical issue. This outlier issue is sometimes referred to as the curse of big data.....[more]


-tj




============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com


============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com



--
==========================================
J. T. Johnson
Institute for Analytic Journalism   --   Santa Fe, NM USA
505.577.6482(c)                                    505.473.9646(h)
Twitter: jtjohnson
http://www.jtjohnson.com                  [hidden email]
==========================================

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com