Friam

Subtle problem with BI

Classic

List

Threaded

3 messages Options

Jack Stafurik

Subtle problem with BI

These are issues I (and many others) have grappled with for many years. I
have strong opinions that deftly straddle both sides. So - I can't be wrong!
To address the points Mikhail raised, I'll use the context of using data to
predict sales.

1) "we assume that our data reflect adequately business issues (customer
behavior) "

The question here is what is "adequately", and what is "customer behavior".
Defining these precisely is very important to developing an accurate, useful
prediction system. Understanding what is "adequate" is tough. For the
client, it initially means "better than what I do now." Later, it evolves
into something like "Error = 5 - 10 %." For sales prediction, this is an
impossible standard. So in the end, the client will be unhappy!

Several problems cause this. First, the customers are not homogeneous.
Different groups behave differently to the same stimuli. And the groupings
you can develop of similarly behaving customers for one product is not the
same as for another. I.e., knowing how a customer responds to a Coke
promotion doesn't necessarily tell you how he/she will respond to a Tide
promotion. Second, you don't always have the most important data you need.
Normally for sales, you will have price and volume data for the item of
interest and competitors (identifying competitors is another problem ...).
But many important data pieces that have major effects on sales (or stock
prices, inventory levels, etc.) are not what I call "observable" in the data
the client can give you. This "unobservable" data can include a major sale
on the item by the WalMart across the street from a store, a major snowstorm
that keeps people out of the stores, errors in the shelf price tag,
stockouts in the distribution chain, local population changes due to
holidays, etc. While sometimes this "unobservable" data can be gotten, it
takes a lot of work and is very expensive. Third, even though you may have
what you think is lots of data (typical retail data sets hold tens of
billions of transactions), it isn't enough! By the time you develop a model
you think has all the important variables/features (e.g., price, time of
day, day of week, day of month, month of year, prices of major competitive
items in store, etc.), and develop a reasonable number of values for each
that lead to different behavior, you find you have a very large
multidimensional matrix, which for many of the elements will have only a few
(0 - 10) observations. Theoretically, you need 20+ observations per element
to give you statistically valid results. Fourth, often the data you get is
"dirty", with e.g. price errors, unidentified replacement products, and so
on. We have found that anywhere from 30 - 80% of the time required to do an
analysis/model development task is needed to understand and clean the data
the client provides.

There are of course other problems, but the ones above tend to be the most
significant.

2) "we update (patch) our data-collecting software very often."

I don't understand why this is a problem. Normally, data collection software
for business (e.g., Point of Sale cash register data) is pretty robust. I
assume here he means that as new types of data (e.g., new
variables/features) are discovered or developed and as dirty data is
cleaned, that the models you develop will change. This should be done. The
process we use to develop statistical BI models is a) clean the data, b)
examine it to understand it as much as possible and identify important
features/variables, c) talk to experts to develop "domain knowledge", d)
develop with the client desired performance specifications, e) develop and
test a model, f) figure out why the results are so bad, g) modify
algorithms, add or subtract data types, h) repeat until results are "good
enough", money runs out, client gets antsy, etc.

I think that changing your data structures and models is usually an
important and necessary part of developing a model that will meet your
client's accuracy requirements.

Nuff said.

Jack Stafurik

>
> Message: 1
> Date: Sat, 03 Mar 2007 11:23:20 -0500
> From: "Phil Henshaw" <sy at synapse9.com>
> Subject: Re: [FRIAM] Subtle problem with BI
> To: "'The Friday Morning Applied Complexity Coffee Group'"
> <friam at redfish.com>
> Message-ID: <000a01c75db0$426d38f0$2f01a8c0 at SavyII>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I don't quite understand the details, but sounds link a kind of 'ah ha'
> observation of both natural systems in operation and the self-reference
> dilemma of theory. My rule is try to never change the definition of
> your measures. It's sort of like maintaining software compatibility.
> if you arbitrarily change the structure of the data you collect you
> can't compare old and new system structures they reflect nor how your
> old and new questions relate to each other. It's such a huge
> temptation to change your measures to fit your constantly evolving
> questions, but basically..., don't do it. :)
>
>
>
> Phil Henshaw ????.?? ? `?.????
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 680 Ft. Washington Ave
> NY NY 10040
> tel: 212-795-4844
> e-mail: pfh at synapse9.com
> explorations: www.synapse9.com <http://www.synapse9.com/>
>
> -----Original Message-----
> From: friam-bounces at redfish.com [mailto:friam-bounces at redfish.com] On
> Behalf Of Mikhail Gorelkin
> Sent: Tuesday, February 27, 2007 5:06 PM
> To: FRIAM
> Subject: [FRIAM] Subtle problem with BI
>
>
>
> Hello all,
>
>
>
> It seems there is a subtle problem with BI (data mining, data
> visualization, etc.). Usually we assume that our data reflect adequately
> business issues (customer behavior), and in the same time we update
> (patch) our data-collecting software very often, which reflects the very
> fact of its (more or less) inadequacy! So, our data also have such
> inadequacy! but we never try to estimate it 1) to improve our software;
> 2) to make our business decision more accurate. It looks like both our
> data-collecting software and BI are linked together forming a business
> (and cybernetic!) model.
>
>
>
> Any comments?
>
>
>
> Mikhail
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://redfish.com/pipermail/friam_redfish.com/attachments/20070303/eb14ee4a/attachment-0001.html
>
> ------------------------------
>
> _______________________________________________
> Friam mailing list
> Friam at redfish.com
> http://redfish.com/mailman/listinfo/friam_redfish.com
>
>
> End of Friam Digest, Vol 45, Issue 3
> ************************************
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.446 / Virus Database: 268.18.6/709 - Release Date: 3/3/2007
> 8:12 AM
>
>

Mikhail Gorelkin

Subtle problem with BI

The situation is just opposite :-) Now we develop simple, static (and well
(without contradictions) and fully defined from the beginning) let's say
Newtonian-type of software systems. I mean mainstream software development;
particularly, IT systems. But to better ( without over-simplifications and
lots of assumptions :-) ) reflect / simulate the nature of modern business
(dynamic, chaotic in some aspects, and constantly evolving) we need to learn
to build soft-systems with the same characteristics. (I am afraid that today
we just offer to business some technological models without estimation of
their adequacy for particular cases. How about creating a theory-like
business analysis? :-) ) It's another models. Another level! Dynamic,
self-adaptive ( without any refactoring :-) ), self-evolving, and complex
software. For example, the global IT system as a whole for a big and
globally distributed company ( including all transactions :-) ). As we grow,
our perception of the world, like a data-collecting system, is changing
constantly and... we cope with this fact! and we make better and better
decisions. So, each evolving system...

--Mikhail

----- Original Message -----
From: "Jack Stafurik" <[hidden email]>
To: <friam at redfish.com>
Sent: Saturday, March 03, 2007 3:58 PM
Subject: Re: [FRIAM] Subtle problem with BI

> These are issues I (and many others) have grappled with for many years. I
> have strong opinions that deftly straddle both sides. So - I can't be
> wrong!
> To address the points Mikhail raised, I'll use the context of using data
> to
> predict sales.
>
> 1) "we assume that our data reflect adequately business issues (customer
> behavior) "
>
> The question here is what is "adequately", and what is "customer
> behavior".
> Defining these precisely is very important to developing an accurate,
> useful
> prediction system. Understanding what is "adequate" is tough. For the
> client, it initially means "better than what I do now." Later, it evolves
> into something like "Error = 5 - 10 %." For sales prediction, this is an
> impossible standard. So in the end, the client will be unhappy!
>
> Several problems cause this. First, the customers are not homogeneous.
> Different groups behave differently to the same stimuli. And the groupings
> you can develop of similarly behaving customers for one product is not the
> same as for another. I.e., knowing how a customer responds to a Coke
> promotion doesn't necessarily tell you how he/she will respond to a Tide
> promotion. Second, you don't always have the most important data you need.
> Normally for sales, you will have price and volume data for the item of
> interest and competitors (identifying competitors is another problem ...).
> But many important data pieces that have major effects on sales (or stock
> prices, inventory levels, etc.) are not what I call "observable" in the
> data
> the client can give you. This "unobservable" data can include a major sale
> on the item by the WalMart across the street from a store, a major
> snowstorm
> that keeps people out of the stores, errors in the shelf price tag,
> stockouts in the distribution chain, local population changes due to
> holidays, etc. While sometimes this "unobservable" data can be gotten, it
> takes a lot of work and is very expensive. Third, even though you may have
> what you think is lots of data (typical retail data sets hold tens of
> billions of transactions), it isn't enough! By the time you develop a
> model
> you think has all the important variables/features (e.g., price, time of
> day, day of week, day of month, month of year, prices of major competitive
> items in store, etc.), and develop a reasonable number of values for each
> that lead to different behavior, you find you have a very large
> multidimensional matrix, which for many of the elements will have only a
> few
> (0 - 10) observations. Theoretically, you need 20+ observations per
> element
> to give you statistically valid results. Fourth, often the data you get is
> "dirty", with e.g. price errors, unidentified replacement products, and so
> on. We have found that anywhere from 30 - 80% of the time required to do
> an
> analysis/model development task is needed to understand and clean the data
> the client provides.
>
> There are of course other problems, but the ones above tend to be the most
> significant.
>
> 2) "we update (patch) our data-collecting software very often."
>
> I don't understand why this is a problem. Normally, data collection
> software
> for business (e.g., Point of Sale cash register data) is pretty robust. I
> assume here he means that as new types of data (e.g., new
> variables/features) are discovered or developed and as dirty data is
> cleaned, that the models you develop will change. This should be done. The
> process we use to develop statistical BI models is a) clean the data, b)
> examine it to understand it as much as possible and identify important
> features/variables, c) talk to experts to develop "domain knowledge", d)
> develop with the client desired performance specifications, e) develop and
> test a model, f) figure out why the results are so bad, g) modify
> algorithms, add or subtract data types, h) repeat until results are "good
> enough", money runs out, client gets antsy, etc.
>
> I think that changing your data structures and models is usually an
> important and necessary part of developing a model that will meet your
> client's accuracy requirements.
>
> Nuff said.
>
> Jack Stafurik
>
>>
>> Message: 1
>> Date: Sat, 03 Mar 2007 11:23:20 -0500
>> From: "Phil Henshaw" <sy at synapse9.com>
>> Subject: Re: [FRIAM] Subtle problem with BI
>> To: "'The Friday Morning Applied Complexity Coffee Group'"
>> <friam at redfish.com>
>> Message-ID: <000a01c75db0$426d38f0$2f01a8c0 at SavyII>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> I don't quite understand the details, but sounds link a kind of 'ah ha'
>> observation of both natural systems in operation and the self-reference
>> dilemma of theory. My rule is try to never change the definition of
>> your measures. It's sort of like maintaining software compatibility.
>> if you arbitrarily change the structure of the data you collect you
>> can't compare old and new system structures they reflect nor how your
>> old and new questions relate to each other. It's such a huge
>> temptation to change your measures to fit your constantly evolving
>> questions, but basically..., don't do it. :)
>>
>>
>>
>> Phil Henshaw ????.?? ? `?.????
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 680 Ft. Washington Ave
>> NY NY 10040
>> tel: 212-795-4844
>> e-mail: pfh at synapse9.com
>> explorations: www.synapse9.com <http://www.synapse9.com/>
>>
>> -----Original Message-----
>> From: friam-bounces at redfish.com [mailto:friam-bounces at redfish.com] On
>> Behalf Of Mikhail Gorelkin
>> Sent: Tuesday, February 27, 2007 5:06 PM
>> To: FRIAM
>> Subject: [FRIAM] Subtle problem with BI
>>
>>
>>
>> Hello all,
>>
>>
>>
>> It seems there is a subtle problem with BI (data mining, data
>> visualization, etc.). Usually we assume that our data reflect adequately
>> business issues (customer behavior), and in the same time we update
>> (patch) our data-collecting software very often, which reflects the very
>> fact of its (more or less) inadequacy! So, our data also have such
>> inadequacy! but we never try to estimate it 1) to improve our software;
>> 2) to make our business decision more accurate. It looks like both our
>> data-collecting software and BI are linked together forming a business
>> (and cybernetic!) model.
>>
>>
>>
>> Any comments?
>>
>>
>>
>> Mikhail
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> http://redfish.com/pipermail/friam_redfish.com/attachments/20070303/eb14ee4a/attachment-0001.html
>>
>> ------------------------------
>>
>> _______________________________________________
>> Friam mailing list
>> Friam at redfish.com
>> http://redfish.com/mailman/listinfo/friam_redfish.com
>>
>>
>> End of Friam Digest, Vol 45, Issue 3
>> ************************************
>>
>>
>> --
>> No virus found in this incoming message.
>> Checked by AVG Free Edition.
>> Version: 7.5.446 / Virus Database: 268.18.6/709 - Release Date: 3/3/2007
>> 8:12 AM
>>
>>
>
>
> ============================================================
> FRIAM Applied Complexity Group listserv
> Meets Fridays 9a-11:30 at cafe at St. John's College
> lectures, archives, unsubscribe, maps at http://www.friam.org
>

Phil Henshaw-2

Subtle problem with BI

Modeling reality has all those difficulties both you and Jack mention,
and be there hacks out there looking for plausible deniability for the
many mistakes us planners and predictors tend to make, we've got plenty!
Yes, it still is worth working harder and more carefully with prediction
methods that haven't been working, because you do sometimes turn up
improvements. One of the improvements people haven't quite realized we
need, though, is to learn how to read the autonomous agents in our
environments as being as being actually autonomous.

That's where having a comprehensive method of identifying independent
behavioral systems in your data is useful... It picks out where there
are new worlds of behavior you need to learn about. Nature has been
irritatingly not following our rules for a long long time now, with
creating all manner of new autonomous evolving actors to fool our models
of past behavior, and for us to endlessly deny the existence of, being
one of her favorite tricks! :,)

Phil Henshaw ????.?? ? `?.????
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
680 Ft. Washington Ave
NY NY 10040
tel: 212-795-4844
e-mail: pfh at synapse9.com
explorations: www.synapse9.com

> -----Original Message-----
> From: friam-bounces at redfish.com
> [mailto:friam-bounces at redfish.com] On Behalf Of Mikhail Gorelkin
> Sent: Sunday, March 04, 2007 12:03 AM
> To: The Friday Morning Applied Complexity Coffee Group
> Subject: Re: [FRIAM] Subtle problem with BI
>
>
> The situation is just opposite :-) Now we develop simple,
> static (and well (without contradictions) and fully defined
> from the beginning) let's say Newtonian-type of software
> systems. I mean mainstream software development;
> particularly, IT systems. But to better ( without
> over-simplifications and lots of assumptions :-) ) reflect /
> simulate the nature of modern business (dynamic, chaotic in
> some aspects, and constantly evolving) we need to learn to
> build soft-systems with the same characteristics. (I am
> afraid that today we just offer to business some
> technological models without estimation of their adequacy for
> particular cases. How about creating a theory-like business
> analysis? :-) ) It's another models. Another level! Dynamic,
> self-adaptive ( without any refactoring :-) ), self-evolving,
> and complex software. For example, the global IT system as a
> whole for a big and globally distributed company ( including
> all transactions :-) ). As we grow, our perception of the
> world, like a data-collecting system, is changing constantly
> and... we cope with this fact! and we make better and better
> decisions. So, each evolving system...
>
>
>
> --Mikhail
>
>
> ----- Original Message -----
> From: "Jack Stafurik" <jstafurik at earthlink.net>
> To: <friam at redfish.com>
> Sent: Saturday, March 03, 2007 3:58 PM
> Subject: Re: [FRIAM] Subtle problem with BI
>
>
> > These are issues I (and many others) have grappled with for many
> > years. I have strong opinions that deftly straddle both
> sides. So - I
> > can't be wrong! To address the points Mikhail raised, I'll use the
> > context of using data to
> > predict sales.
> >
> > 1) "we assume that our data reflect adequately business issues
> > (customer
> > behavior) "
> >
> > The question here is what is "adequately", and what is "customer
> > behavior".
> > Defining these precisely is very important to developing an
> accurate,
> > useful
> > prediction system. Understanding what is "adequate" is
> tough. For the
> > client, it initially means "better than what I do now."
> Later, it evolves
> > into something like "Error = 5 - 10 %." For sales
> prediction, this is an
> > impossible standard. So in the end, the client will be unhappy!
> >
> > Several problems cause this. First, the customers are not
> homogeneous.
> > Different groups behave differently to the same stimuli. And the
> > groupings you can develop of similarly behaving customers for one
> > product is not the same as for another. I.e., knowing how a
> customer
> > responds to a Coke promotion doesn't necessarily tell you
> how he/she
> > will respond to a Tide promotion. Second, you don't always have the
> > most important data you need. Normally for sales, you will
> have price
> > and volume data for the item of interest and competitors
> (identifying
> > competitors is another problem ...). But many important data pieces
> > that have major effects on sales (or stock prices,
> inventory levels,
> > etc.) are not what I call "observable" in the data the
> client can give
> > you. This "unobservable" data can include a major sale on
> the item by
> > the WalMart across the street from a store, a major snowstorm
> > that keeps people out of the stores, errors in the shelf price tag,
> > stockouts in the distribution chain, local population changes due to
> > holidays, etc. While sometimes this "unobservable" data can
> be gotten, it
> > takes a lot of work and is very expensive. Third, even
> though you may have
> > what you think is lots of data (typical retail data sets
> hold tens of
> > billions of transactions), it isn't enough! By the time you
> develop a
> > model
> > you think has all the important variables/features (e.g.,
> price, time of
> > day, day of week, day of month, month of year, prices of
> major competitive
> > items in store, etc.), and develop a reasonable number of
> values for each
> > that lead to different behavior, you find you have a very large
> > multidimensional matrix, which for many of the elements
> will have only a
> > few
> > (0 - 10) observations. Theoretically, you need 20+ observations per
> > element
> > to give you statistically valid results. Fourth, often the
> data you get is
> > "dirty", with e.g. price errors, unidentified replacement
> products, and so
> > on. We have found that anywhere from 30 - 80% of the time
> required to do
> > an
> > analysis/model development task is needed to understand and
> clean the data
> > the client provides.
> >
> > There are of course other problems, but the ones above tend
> to be the
> > most significant.
> >
> > 2) "we update (patch) our data-collecting software very often."
> >
> > I don't understand why this is a problem. Normally, data collection
> > software
> > for business (e.g., Point of Sale cash register data) is
> pretty robust. I
> > assume here he means that as new types of data (e.g., new
> > variables/features) are discovered or developed and as dirty data is
> > cleaned, that the models you develop will change. This
> should be done. The
> > process we use to develop statistical BI models is a) clean
> the data, b)
> > examine it to understand it as much as possible and
> identify important
> > features/variables, c) talk to experts to develop "domain
> knowledge", d)
> > develop with the client desired performance specifications,
> e) develop and
> > test a model, f) figure out why the results are so bad, g) modify
> > algorithms, add or subtract data types, h) repeat until
> results are "good
> > enough", money runs out, client gets antsy, etc.
> >
> > I think that changing your data structures and models is usually an
> > important and necessary part of developing a model that
> will meet your
> > client's accuracy requirements.
> >
> > Nuff said.
> >
> > Jack Stafurik
> >
> >>
> >> Message: 1
> >> Date: Sat, 03 Mar 2007 11:23:20 -0500
> >> From: "Phil Henshaw" <sy at synapse9.com>
> >> Subject: Re: [FRIAM] Subtle problem with BI
> >> To: "'The Friday Morning Applied Complexity Coffee Group'"
> >> <friam at redfish.com>
> >> Message-ID: <000a01c75db0$426d38f0$2f01a8c0 at SavyII>
> >> Content-Type: text/plain; charset="iso-8859-1"
> >>
> >> I don't quite understand the details, but sounds link a
> kind of 'ah
> >> ha' observation of both natural systems in operation and
> the self-reference
> >> dilemma of theory. My rule is try to never change the
> definition of
> >> your measures. It's sort of like maintaining software
> compatibility.
> >> if you arbitrarily change the structure of the data you
> collect you
> >> can't compare old and new system structures they reflect
> nor how your
> >> old and new questions relate to each other. It's such a huge
> >> temptation to change your measures to fit your constantly evolving
> >> questions, but basically..., don't do it. :)
> >>
> >>
> >>
> >> Phil Henshaw ????.?? ? `?.????
> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> 680 Ft. Washington Ave
> >> NY NY 10040
> >> tel: 212-795-4844
> >> e-mail: pfh at synapse9.com
> >> explorations: www.synapse9.com <http://www.synapse9.com/>
> >>
> >> -----Original Message-----
> >> From: friam-bounces at redfish.com
> [mailto:friam-bounces at redfish.com] On
> >> Behalf Of Mikhail Gorelkin
> >> Sent: Tuesday, February 27, 2007 5:06 PM
> >> To: FRIAM
> >> Subject: [FRIAM] Subtle problem with BI
> >>
> >>
> >>
> >> Hello all,
> >>
> >>
> >>
> >> It seems there is a subtle problem with BI (data mining, data
> >> visualization, etc.). Usually we assume that our data reflect
> >> adequately business issues (customer behavior), and in the
> same time
> >> we update
> >> (patch) our data-collecting software very often, which
> reflects the very
> >> fact of its (more or less) inadequacy! So, our data also have such
> >> inadequacy! but we never try to estimate it 1) to improve
> our software;
> >> 2) to make our business decision more accurate. It looks
> like both our
> >> data-collecting software and BI are linked together
> forming a business
> >> (and cybernetic!) model.
> >>
> >>
> >>
> >> Any comments?
> >>
> >>
> >>
> >> Mikhail
> >>
> >> -------------- next part --------------
> >> An HTML attachment was scrubbed...
> >> URL:
> >>
> http://redfish.com/pipermail/friam_redfish.com>

/attachments/20070303/e

> >> b14ee4a/attachment-0001.html
> >>
> >> ------------------------------
> >>
> >> _______________________________________________
> >> Friam mailing list
> >> Friam at redfish.com
> >> http://redfish.com/mailman/listinfo/friam_redfish.com
> >>
> >>
> >> End of Friam Digest, Vol 45, Issue 3
> >> ************************************
> >>
> >>
> >> --
> >> No virus found in this incoming message.
> >> Checked by AVG Free Edition.
> >> Version: 7.5.446 / Virus Database: 268.18.6/709 - Release
> Date: 3/3/2007
> >> 8:12 AM
> >>
> >>
> >
> >
> > ============================================================
> > FRIAM Applied Complexity Group listserv
> > Meets Fridays 9a-11:30 at cafe at St. John's College lectures,
> > archives, unsubscribe, maps at http://www.friam.org
> >
>
>
> ============================================================
> FRIAM Applied Complexity Group listserv
> Meets Fridays 9a-11:30 at cafe at St. John's College
> lectures, archives, unsubscribe, maps at http://www.friam.org
>
>