Friam

In case any body is interested

Classic

List

Threaded

10 messages Options

Douglas Roberts-2

In case any body is interested

One of my projects is funded by NIH, and it sponsored (read: paid for) a group of 15 of us software developer types from 10 different organizations across the country who are working on the project to get together last week in Las Vegas, NV to conduct a two-day hackathon. We split into three groups, and my group produced some rough, ugly, but working Python and R code.

The Python code conducts keyword searches on archived 1% Twitter API data, filtered to only search only those tweets that have valid geolocation data. The short piece of R code calls a Google map API and plots the data on a Google map in a browser, allowing the user to click on the geolocated map points to view the originator's tweet text.

Our next step will be to replace the R code with Python for calling the Google map API.

Here, it's ugly, but it's free. Don't say I never gave you anything.

--Doug

Doug Roberts
[hidden email]

http://parrot-farm.net/Second-Cousins

505-455-7333 - Office
505-672-8213 - Mobile

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

TwitterSearch.py (7K) Download Attachment

plotLatLongTwitterDJR.R (1008 bytes) Download Attachment

Steve Smith

Re: In case any body is interested

Doug -

So the point is to attempt "early detection" of an outbreak of something based on what people are tweeting?

( "influenza", "flu", "cold", "fever", "H1N1", "H3N2", "sneezing", "aching", "ache", "achy", "congested" )

It certainly sounds like there might be some utility to it, but I'm wondering what kinds of reasoning went into this? Is it based on any models of who tweets or what they are likely to tweet about?

Was it more of a demonstration or team-building exercise, or does someone expect to actually put it to use?

So, the data was pre-archived, but I presume a more useful version would work from more real-time data and probably would have a sliding time (exponential moving average?) window?

Do you know about Norm Packard's (of Eudaimonic, Prediction, ProtoLife fame) latest venture called LuckySort? It's R interface is called TopicWatchr and seems to be doing something roughly similar (but without specific geolocation?). Their examples suggest that they are aiming this at the Investment sector.

Our own Mick Thompson (well, SFX if not FRIAM) was working on related things before the startup Collecta went dark... I'm not sure if he's still in this game (or on this list?). I used Collecta when it was alive... it aggregated Twitter as well as some subset of blog and maybe newsfeeds? For example, stuck in northbound traffic on I-25 near La Cienega one time, I was able to discover within seconds of stopping my vehicle that 3 people also stuck in traffic had mentioned that they too were stuck and one of them was close enough to the front of the line to see that it was a fuel truck that had been involved in an accident so they weren't inclined to let anyone past it until the HazMat or Fire folks had determined there was no risk. On the other hand a CB Radio and/or a Police Scanner (oldschool) would have told me all that and more in time to take the La Cienega exit and frontage on into town with only a minor delay.

One of my projects is funded by NIH, and it sponsored (read: paid for) a group of 15 of us software developer types from 10 different organizations across the country who are working on the project to get together last week in Las Vegas, NV to conduct a two-day hackathon. We split into three groups, and my group produced some rough, ugly, but working Python and R code.

The Python code conducts keyword searches on archived 1% Twitter API data, filtered to only search only those tweets that have valid geolocation data. The short piece of R code calls a Google map API and plots the data on a Google map in a browser, allowing the user to click on the geolocated map points to view the originator's tweet text.

Our next step will be to replace the R code with Python for calling the Google map API.

Here, it's ugly, but it's free. Don't say I never gave you anything.

--Doug

--

Doug Roberts
[hidden email]
http://parrot-farm.net/Second-Cousins

505-455-7333 - Office
505-672-8213 - Mobile
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

Douglas Roberts-2

Re: In case any body is interested

Steve,

I guest the point of this exercise is to determine if there are search correlations that can be teased out of (the incredibly dirty) twitter data which can be used to produce meaningful studies.

Other people say they've used Twitter data to produce more accurate and/or timely influenza outbreak predictions than, say, the CDC. But on the other hand, Google Flu Trends totally missed the boat this year.

Plus, I just like airing messy code.

--Doug

On Mon, Mar 4, 2013 at 12:26 PM, Steve Smith <[hidden email]> wrote:

Doug -

So the point is to attempt "early detection" of an outbreak of something based on what people are tweeting?

( "influenza", "flu", "cold", "fever", "H1N1", "H3N2", "sneezing", "aching", "ache", "achy", "congested" )

It certainly sounds like there might be some utility to it, but I'm wondering what kinds of reasoning went into this? Is it based on any models of who tweets or what they are likely to tweet about?

Was it more of a demonstration or team-building exercise, or does someone expect to actually put it to use?

So, the data was pre-archived, but I presume a more useful version would work from more real-time data and probably would have a sliding time (exponential moving average?) window?

Do you know about Norm Packard's (of Eudaimonic, Prediction, ProtoLife fame) latest venture called LuckySort? It's R interface is called TopicWatchr and seems to be doing something roughly similar (but without specific geolocation?). Their examples suggest that they are aiming this at the Investment sector.

Our own Mick Thompson (well, SFX if not FRIAM) was working on related things before the startup Collecta went dark... I'm not sure if he's still in this game (or on this list?). I used Collecta when it was alive... it aggregated Twitter as well as some subset of blog and maybe newsfeeds? For example, stuck in northbound traffic on I-25 near La Cienega one time, I was able to discover within seconds of stopping my vehicle that 3 people also stuck in traffic had mentioned that they too were stuck and one of them was close enough to the front of the line to see that it was a fuel truck that had been involved in an accident so they weren't inclined to let anyone past it until the HazMat or Fire folks had determined there was no risk. On the other hand a CB Radio and/or a Police Scanner (oldschool) would have told me all that and more in time to take the La Cienega exit and frontage on into town with only a minor delay.
One of my projects is funded by NIH, and it sponsored (read: paid for) a group of 15 of us software developer types from 10 different organizations across the country who are working on the project to get together last week in Las Vegas, NV to conduct a two-day hackathon. We split into three groups, and my group produced some rough, ugly, but working Python and R code.

The Python code conducts keyword searches on archived 1% Twitter API data, filtered to only search only those tweets that have valid geolocation data. The short piece of R code calls a Google map API and plots the data on a Google map in a browser, allowing the user to click on the geolocated map points to view the originator's tweet text.

Our next step will be to replace the R code with Python for calling the Google map API.

Here, it's ugly, but it's free. Don't say I never gave you anything.

--Doug

--

Doug Roberts
[hidden email]
http://parrot-farm.net/Second-Cousins

<a href="tel:505-455-7333" value="+15054557333" target="_blank">505-455-7333 - Office
<a href="tel:505-672-8213" value="+15056728213" target="_blank">505-672-8213 - Mobile
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

Doug Roberts
[hidden email]

http://parrot-farm.net/Second-Cousins

505-455-7333 - Office
505-672-8213 - Mobile

Steve Smith

Re: In case any body is interested

In reply to this post by Steve Smith

I twitched and accidentally <sent> a tad early...

I wanted to add links to Collecta and ask the question of whether anyone knows what happened to Google's RealTime effort? it appears to be underground supporting Google's other services like Trends?

I realize this gives a different granularity and I'm not clear on how *real time* Trends is these days...

http://www.google.com/trends/explore#q=%22influenza%22,%20%22flu%22,%20%22cold%22,%20%22fever%22,%20%22H1N1%22,%20%22H3N2%22,%20%22sneezing%22,%20%22aching%22,%20%22ache%22,%20%22achy%22,%20%22congested%22

It also seems as if subtracting news items might be important (for your purposes) since I assume you are looking for early detection of people having these symptoms rather than the echoes of trends in popular media (or an advertising push by NyQuil) ?

Doug -

So the point is to attempt "early detection" of an outbreak of something based on what people are tweeting?

( "influenza", "flu", "cold", "fever", "H1N1", "H3N2", "sneezing", "aching", "ache", "achy", "congested" )

It certainly sounds like there might be some utility to it, but I'm wondering what kinds of reasoning went into this? Is it based on any models of who tweets or what they are likely to tweet about?

Was it more of a demonstration or team-building exercise, or does someone expect to actually put it to use?

So, the data was pre-archived, but I presume a more useful version would work from more real-time data and probably would have a sliding time (exponential moving average?) window?

Do you know about Norm Packard's (of Eudaimonic, Prediction, ProtoLife fame) latest venture called LuckySort? It's R interface is called TopicWatchr and seems to be doing something roughly similar (but without specific geolocation?). Their examples suggest that they are aiming this at the Investment sector.

Our own Mick Thompson (well, SFX if not FRIAM) was working on related things before the startup Collecta went dark... I'm not sure if he's still in this game (or on this list?). I used Collecta when it was alive... it aggregated Twitter as well as some subset of blog and maybe newsfeeds? For example, stuck in northbound traffic on I-25 near La Cienega one time, I was able to discover within seconds of stopping my vehicle that 3 people also stuck in traffic had mentioned that they too were stuck and one of them was close enough to the front of the line to see that it was a fuel truck that had been involved in an accident so they weren't inclined to let anyone past it until the HazMat or Fire folks had determined there was no risk. On the other hand a CB Radio and/or a Police Scanner (oldschool) would have told me all that and more in time to take the La Cienega exit and frontage on into town with only a minor delay.
One of my projects is funded by NIH, and it sponsored (read: paid for) a group of 15 of us software developer types from 10 different organizations across the country who are working on the project to get together last week in Las Vegas, NV to conduct a two-day hackathon. We split into three groups, and my group produced some rough, ugly, but working Python and R code.

The Python code conducts keyword searches on archived 1% Twitter API data, filtered to only search only those tweets that have valid geolocation data. The short piece of R code calls a Google map API and plots the data on a Google map in a browser, allowing the user to click on the geolocated map points to view the originator's tweet text.

Our next step will be to replace the R code with Python for calling the Google map API.

Here, it's ugly, but it's free. Don't say I never gave you anything.

--Doug

--

Doug Roberts
[hidden email]
http://parrot-farm.net/Second-Cousins

505-455-7333 - Office
505-672-8213 - Mobile
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

Douglas Roberts-2

Re: In case any body is interested

Yep, although who's to say that an onslaught of Nyquil twittertizing does not signify the beginning of a flu outbreak?

--Doug

On Mon, Mar 4, 2013 at 7:40 PM, Steve Smith <[hidden email]> wrote:

It also seems as if subtracting news items might be important (for your purposes) since I assume you are looking for early detection of people having these symptoms rather than the echoes of trends in popular media (or an advertising push by NyQuil) ?
Doug -

So the point is to attempt "early detection" of an outbreak of something based on what people are tweeting?

( "influenza", "flu", "cold", "fever", "H1N1", "H3N2", "sneezing", "aching", "ache", "achy", "congested" )

It certainly sounds like there might be some utility to it, but I'm wondering what kinds of reasoning went into this? Is it based on any models of who tweets or what they are likely to tweet about?

Was it more of a demonstration or team-building exercise, or does someone expect to actually put it to use?

So, the data was pre-archived, but I presume a more useful version would work from more real-time data and probably would have a sliding time (exponential moving average?) window?

Do you know about Norm Packard's (of Eudaimonic, Prediction, ProtoLife fame) latest venture called LuckySort? It's R interface is called TopicWatchr and seems to be doing something roughly similar (but without specific geolocation?). Their examples suggest that they are aiming this at the Investment sector.

Our own Mick Thompson (well, SFX if not FRIAM) was working on related things before the startup Collecta went dark... I'm not sure if he's still in this game (or on this list?). I used Collecta when it was alive... it aggregated Twitter as well as some subset of blog and maybe newsfeeds? For example, stuck in northbound traffic on I-25 near La Cienega one time, I was able to discover within seconds of stopping my vehicle that 3 people also stuck in traffic had mentioned that they too were stuck and one of them was close enough to the front of the line to see that it was a fuel truck that had been involved in an accident so they weren't inclined to let anyone past it until the HazMat or Fire folks had determined there was no risk. On the other hand a CB Radio and/or a Police Scanner (oldschool) would have told me all that and more in time to take the La Cienega exit and frontage on into town with only a minor delay.
One of my projects is funded by NIH, and it sponsored (read: paid for) a group of 15 of us software developer types from 10 different organizations across the country who are working on the project to get together last week in Las Vegas, NV to conduct a two-day hackathon. We split into three groups, and my group produced some rough, ugly, but working Python and R code.

The Python code conducts keyword searches on archived 1% Twitter API data, filtered to only search only those tweets that have valid geolocation data. The short piece of R code calls a Google map API and plots the data on a Google map in a browser, allowing the user to click on the geolocated map points to view the originator's tweet text.

Our next step will be to replace the R code with Python for calling the Google map API.

Here, it's ugly, but it's free. Don't say I never gave you anything.

--Doug

--

Doug Roberts
[hidden email]
http://parrot-farm.net/Second-Cousins

<a href="tel:505-455-7333" value="+15054557333" target="_blank">505-455-7333 - Office
<a href="tel:505-672-8213" value="+15056728213" target="_blank">505-672-8213 - Mobile
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

Doug Roberts
[hidden email]

http://parrot-farm.net/Second-Cousins

505-455-7333 - Office
505-672-8213 - Mobile

Tom Johnson

Re: In case any body is interested

In reply to this post by Douglas Roberts-2

Doug:

I'm probably not clear on your objective, but are you folks aware of the work the Guardian did a couple years back on mapping the riots in London in real time using Tweets?
See http://vielmetti.typepad.com/vacuum/2011/08/august-2011-london-riot-maps.html

=tom

On Mon, Mar 4, 2013 at 11:33 AM, Douglas Roberts <[hidden email]> wrote:

One of my projects is funded by NIH, and it sponsored (read: paid for) a group of 15 of us software developer types from 10 different organizations across the country who are working on the project to get together last week in Las Vegas, NV to conduct a two-day hackathon. We split into three groups, and my group produced some rough, ugly, but working Python and R code.

The Python code conducts keyword searches on archived 1% Twitter API data, filtered to only search only those tweets that have valid geolocation data. The short piece of R code calls a Google map API and plots the data on a Google map in a browser, allowing the user to click on the geolocated map points to view the originator's tweet text.

Our next step will be to replace the R code with Python for calling the Google map API.

Here, it's ugly, but it's free. Don't say I never gave you anything.

--Doug

--
Doug Roberts
[hidden email]
http://parrot-farm.net/Second-Cousins

<a href="tel:505-455-7333" value="+15054557333" target="_blank">505-455-7333 - Office
<a href="tel:505-672-8213" value="+15056728213" target="_blank">505-672-8213 - Mobile

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

==========================================
J. T. Johnson
Institute for Analytic Journalism -- Santa Fe, NM USA
505.577.6482(c) 505.473.9646(h)
Twitter: jtjohnson

http://www.jtjohnson.com [hidden email]
==========================================

Douglas Roberts-2

Re: In case any body is interested

I had heard about that study, Tom.

Our objective on this NIH-funded project is to conduct research on what what infectious disease-related studies could be conducted using Twitter data. Plenty of opportunities exist, a few of which (influenza outbreak detection, for example) have already been demonstrated.

Along the way we get to do some computer science, because this is truly "big data", and innovative solutions to operating on it are necessary.

--Doug

On Mon, Mar 4, 2013 at 8:02 PM, Tom Johnson <[hidden email]> wrote:

Doug:

I'm probably not clear on your objective, but are you folks aware of the work the Guardian did a couple years back on mapping the riots in London in real time using Tweets?
See http://vielmetti.typepad.com/vacuum/2011/08/august-2011-london-riot-maps.html

=tom

On Mon, Mar 4, 2013 at 11:33 AM, Douglas Roberts <[hidden email]> wrote:

One of my projects is funded by NIH, and it sponsored (read: paid for) a group of 15 of us software developer types from 10 different organizations across the country who are working on the project to get together last week in Las Vegas, NV to conduct a two-day hackathon. We split into three groups, and my group produced some rough, ugly, but working Python and R code.

The Python code conducts keyword searches on archived 1% Twitter API data, filtered to only search only those tweets that have valid geolocation data. The short piece of R code calls a Google map API and plots the data on a Google map in a browser, allowing the user to click on the geolocated map points to view the originator's tweet text.

Our next step will be to replace the R code with Python for calling the Google map API.

Here, it's ugly, but it's free. Don't say I never gave you anything.

--Doug

--
Doug Roberts
[hidden email]
http://parrot-farm.net/Second-Cousins

<a href="tel:505-455-7333" value="+15054557333" target="_blank">505-455-7333 - Office
<a href="tel:505-672-8213" value="+15056728213" target="_blank">505-672-8213 - Mobile

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

--
==========================================
J. T. Johnson
Institute for Analytic Journalism -- Santa Fe, NM USA
<a href="tel:505.577.6482" value="+15055776482" target="_blank">505.577.6482(c) <a href="tel:505.473.9646" value="+15054739646" target="_blank">505.473.9646(h)
Twitter: jtjohnson

http://www.jtjohnson.com [hidden email]
==========================================

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

Doug Roberts
[hidden email]

http://parrot-farm.net/Second-Cousins

505-455-7333 - Office
505-672-8213 - Mobile

Steve Smith

Re: In case any body is interested

In reply to this post by Douglas Roberts-2

Doug -

Yep, although who's to say that an onslaught of Nyquil twittertizing does not signify the beginning of a flu outbreak?

Righto!   That is the point... how to distinguish a rash of NyQuil fanbois extolling it's virtues in 140 chars or less as they subdue early symptoms of a resurging 1918 Influenza outbreak from a catchy jingle P&G/Vicks/NyQuil's latest admen thought up?

Again, I'm not sure of the context of your hackathon (by it's name, it seems more like teambuilding/demonstration than seriously attacking a known/well-vetted problem?) but maybe there was talk of precedent... of more serious studies in how to find useful correlations?

I'd expect careful study would reveal a phased rollout of terms...   that people mutter (twitter?) about different things at the onset of their own symptoms or of those around them than they do as it becomes a full-fledged experience ( e.g. Achy, Sniffly, Congested vs "Sick Day").   And of course there might be an abrupt *rise* or *fall* in tweet frequencies from the same people as they take a day off from work and/or switch from DayQuil to NyQuil and try to sleep it off.

Also, the seriousness of symptoms might be reflected in Google Trends, of people doing "research" as opposed to just mumble-tweeting about it.

- Steve

--Doug
On Mon, Mar 4, 2013 at 7:40 PM, Steve Smith <[hidden email]> wrote:
It also seems as if subtracting news items might be important (for your purposes) since I assume you are looking for early detection of people having these symptoms rather than the echoes of trends in popular media (or an advertising push by NyQuil) ?
Doug -

So the point is to attempt "early detection" of an outbreak of something based on what people are tweeting?

( "influenza", "flu", "cold", "fever", "H1N1", "H3N2", "sneezing", "aching", "ache", "achy", "congested" )

It certainly sounds like there might be some utility to it, but I'm wondering what kinds of reasoning went into this? Is it based on any models of who tweets or what they are likely to tweet about?

Was it more of a demonstration or team-building exercise, or does someone expect to actually put it to use?

So, the data was pre-archived, but I presume a more useful version would work from more real-time data and probably would have a sliding time (exponential moving average?) window?

Do you know about Norm Packard's (of Eudaimonic, Prediction, ProtoLife fame) latest venture called LuckySort? It's R interface is called TopicWatchr and seems to be doing something roughly similar (but without specific geolocation?). Their examples suggest that they are aiming this at the Investment sector.

Our own Mick Thompson (well, SFX if not FRIAM) was working on related things before the startup Collecta went dark... I'm not sure if he's still in this game (or on this list?). I used Collecta when it was alive... it aggregated Twitter as well as some subset of blog and maybe newsfeeds? For example, stuck in northbound traffic on I-25 near La Cienega one time, I was able to discover within seconds of stopping my vehicle that 3 people also stuck in traffic had mentioned that they too were stuck and one of them was close enough to the front of the line to see that it was a fuel truck that had been involved in an accident so they weren't inclined to let anyone past it until the HazMat or Fire folks had determined there was no risk. On the other hand a CB Radio and/or a Police Scanner (oldschool) would have told me all that and more in time to take the La Cienega exit and frontage on into town with only a minor delay.
One of my projects is funded by NIH, and it sponsored (read: paid for) a group of 15 of us software developer types from 10 different organizations across the country who are working on the project to get together last week in Las Vegas, NV to conduct a two-day hackathon. We split into three groups, and my group produced some rough, ugly, but working Python and R code.

The Python code conducts keyword searches on archived 1% Twitter API data, filtered to only search only those tweets that have valid geolocation data. The short piece of R code calls a Google map API and plots the data on a Google map in a browser, allowing the user to click on the geolocated map points to view the originator's tweet text.

Our next step will be to replace the R code with Python for calling the Google map API.

Here, it's ugly, but it's free. Don't say I never gave you anything.

--Doug

--

Doug Roberts
[hidden email]
http://parrot-farm.net/Second-Cousins

<a moz-do-not-send="true" href="tel:505-455-7333" value="+15054557333" target="_blank">505-455-7333 - Office
<a moz-do-not-send="true" href="tel:505-672-8213" value="+15056728213" target="_blank">505-672-8213 - Mobile
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
--

Doug Roberts
[hidden email]
http://parrot-farm.net/Second-Cousins

505-455-7333 - Office
505-672-8213 - Mobile
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

Douglas Roberts-2

Re: In case any body is interested

A "rash" of infectious disease tweets? Not going there.

The goal of our hackathon, in the spirit of true hackathoning: http://en.wikipedia.org/wiki/Hackathon was to "intensively collaborate" and pound out some working code of interest regarding our client's programmatic areas. It was a good exercise, in that we had 15 people, pizza, beer and after two days produced some working code. It was certainly more fun than going into the office...

--Doug

On Mon, Mar 4, 2013 at 8:17 PM, Steve Smith <[hidden email]> wrote:

Doug -

Yep, although who's to say that an onslaught of Nyquil twittertizing does not signify the beginning of a flu outbreak?

Righto!   That is the point... how to distinguish a rash of NyQuil fanbois extolling it's virtues in 140 chars or less as they subdue early symptoms of a resurging 1918 Influenza outbreak from a catchy jingle P&G/Vicks/NyQuil's latest admen thought up?

Again, I'm not sure of the context of your hackathon (by it's name, it seems more like teambuilding/demonstration than seriously attacking a known/well-vetted problem?) but maybe there was talk of precedent... of more serious studies in how to find useful correlations?

I'd expect careful study would reveal a phased rollout of terms...   that people mutter (twitter?) about different things at the onset of their own symptoms or of those around them than they do as it becomes a full-fledged experience ( e.g. Achy, Sniffly, Congested vs "Sick Day").   And of course there might be an abrupt *rise* or *fall* in tweet frequencies from the same people as they take a day off from work and/or switch from DayQuil to NyQuil and try to sleep it off.

Also, the seriousness of symptoms might be reflected in Google Trends, of people doing "research" as opposed to just mumble-tweeting about it.

- Steve
--Doug
On Mon, Mar 4, 2013 at 7:40 PM, Steve Smith <[hidden email]> wrote:
It also seems as if subtracting news items might be important (for your purposes) since I assume you are looking for early detection of people having these symptoms rather than the echoes of trends in popular media (or an advertising push by NyQuil) ?
Doug -

So the point is to attempt "early detection" of an outbreak of something based on what people are tweeting?

( "influenza", "flu", "cold", "fever", "H1N1", "H3N2", "sneezing", "aching", "ache", "achy", "congested" )

It certainly sounds like there might be some utility to it, but I'm wondering what kinds of reasoning went into this? Is it based on any models of who tweets or what they are likely to tweet about?

Was it more of a demonstration or team-building exercise, or does someone expect to actually put it to use?

So, the data was pre-archived, but I presume a more useful version would work from more real-time data and probably would have a sliding time (exponential moving average?) window?

Do you know about Norm Packard's (of Eudaimonic, Prediction, ProtoLife fame) latest venture called LuckySort? It's R interface is called TopicWatchr and seems to be doing something roughly similar (but without specific geolocation?). Their examples suggest that they are aiming this at the Investment sector.

Our own Mick Thompson (well, SFX if not FRIAM) was working on related things before the startup Collecta went dark... I'm not sure if he's still in this game (or on this list?). I used Collecta when it was alive... it aggregated Twitter as well as some subset of blog and maybe newsfeeds? For example, stuck in northbound traffic on I-25 near La Cienega one time, I was able to discover within seconds of stopping my vehicle that 3 people also stuck in traffic had mentioned that they too were stuck and one of them was close enough to the front of the line to see that it was a fuel truck that had been involved in an accident so they weren't inclined to let anyone past it until the HazMat or Fire folks had determined there was no risk. On the other hand a CB Radio and/or a Police Scanner (oldschool) would have told me all that and more in time to take the La Cienega exit and frontage on into town with only a minor delay.
One of my projects is funded by NIH, and it sponsored (read: paid for) a group of 15 of us software developer types from 10 different organizations across the country who are working on the project to get together last week in Las Vegas, NV to conduct a two-day hackathon. We split into three groups, and my group produced some rough, ugly, but working Python and R code.

The Python code conducts keyword searches on archived 1% Twitter API data, filtered to only search only those tweets that have valid geolocation data. The short piece of R code calls a Google map API and plots the data on a Google map in a browser, allowing the user to click on the geolocated map points to view the originator's tweet text.

Our next step will be to replace the R code with Python for calling the Google map API.

Here, it's ugly, but it's free. Don't say I never gave you anything.

--Doug

--

Doug Roberts
[hidden email]
http://parrot-farm.net/Second-Cousins

<a href="tel:505-455-7333" value="+15054557333" target="_blank">505-455-7333 - Office
<a href="tel:505-672-8213" value="+15056728213" target="_blank">505-672-8213 - Mobile
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
--

Doug Roberts
[hidden email]
http://parrot-farm.net/Second-Cousins

<a href="tel:505-455-7333" value="+15054557333" target="_blank">505-455-7333 - Office
<a href="tel:505-672-8213" value="+15056728213" target="_blank">505-672-8213 - Mobile
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

Doug Roberts
[hidden email]

http://parrot-farm.net/Second-Cousins

505-455-7333 - Office
505-672-8213 - Mobile

Owen Densmore

Re: In case any body is interested

Administrator

In reply to this post by Douglas Roberts-2

Who said twitter has a low S/N?

-- Owen