Estimation and curv...
 
Notifications
Clear all

Estimation and curve fitting

42 Posts
9 Users
0 Reactions
14 Views
(@kent-mcmillan)
Posts: 11419
Topic starter
 

Here's a sort of interesting problem in curve-fitting and trend estimation. A company that measures web traffic reports the following ranks for BeerLeg.com:

Traffic Rank in US:

07/30/10 - 56,601
07/29/10 - 59,554
07/28/10 - 62,675
07/26/10 - 69,640

What would you estimate will be the traffic rank of Beer Leg five days out on 08/04/10?

 
Posted : July 30, 2010 5:41 am
(@deleted-user)
Posts: 8349
Registered
 

What do they mean by web traffic?
log-ons and hits etc.
there are only about 400 members here and most if the time it seems like only 10-15% or so are on-line including guests.

I don't understand these big numbers.

I log on from two computers at time (home & away)) so I think that I probably log on 6+ time a day if I do.
But as to your "problem", I guesstimate about 53,999, even though I have no idea of what I am guessing about.

 
Posted : July 30, 2010 6:08 am
(@angelo-fiorenza)
Posts: 219
 

Check out "web traffic" on wiki.....I had to.

I really have no practical knowledge about this sort of thing, but the explanation cleared things up a bit.

There are other factors besides you and I hitting the bookmark and signing in.

 
Posted : July 30, 2010 6:16 am
(@kent-mcmillan)
Posts: 11419
Topic starter
 

> But as to your "problem", I guesstimate about 53,999, even though I have no idea of what I am guessing about.

From the data, it looks as if tomorrow's rank ought to be about 53,999 or less. Look at the differences from day to day and the rate of change of those differences

 
Posted : July 30, 2010 6:18 am
(@deleted-user)
Posts: 8349
Registered
 

on the old board the statistics would dramatically drop on weekends for usage.
I used that variable in my guess.
Hole would post a lot on weekends but I haven't seen him here unless he is Holy Cow. 🙂

I don't post from home very much anymore.
central A/C seems to have quit here a tthe office so I may be out of here soon even though I have a lot of work to do on a plat.
or I guess it is time to find the fan.

 
Posted : July 30, 2010 6:31 am
Wendell
(@wendell)
Posts: 5782
Admin
 

> on the old board the statistics would dramatically drop on weekends for usage.
> I used that variable in my guess.

During the weekends, we generally see about 2/3 of the pageviews we see during the week.

 
Posted : July 30, 2010 6:37 am
(@kent-mcmillan)
Posts: 11419
Topic starter
 

BTW my estimate for 08/04/10 is a US traffic rank of 44,050 based solely upon fitting a curve to the data.

 
Posted : July 30, 2010 6:37 am
Wendell
(@wendell)
Posts: 5782
Admin
 

Those numbers are traffic rank. They show how we compare to all other sites on the internet. So, there are only about 50-something,000 sites that are more visited than us. Pretty good for a bunch of surveyors.

For awhile, we were progressing by 10,000-15,000 per day, but as we get closer to our ultimate settling spot, the daily change is shrinking accordingly. We apparently have a little ways to go before we find our exact niche.

 
Posted : July 30, 2010 6:39 am
(@deleted-user)
Posts: 8349
Registered
 

Relevancy of the BLM Manual

I wish you the best. If it can generate $$ for you then I wish all the best.
But do you have to keep Noddles up all night
at the computer "hitting" the site. 🙂

got the front door opened ..will see how that works..
my old fan needs a greasing it seems

 
Posted : July 30, 2010 6:45 am
Wendell
(@wendell)
Posts: 5782
Admin
 

I should further point out that it really has nothing to do with the number of registered users. It has everything to do with how many people visit, registered or not. Another factor is the time each visitor/user spends on the site. Our average right now is about 11 minutes per visit which, in my book, is amazing.

Further, visitors can only see surveying-related topics, so it's pretty safe to say that the overall audience is surveyors (for the most part).

As time goes on and Google continues to index us, we may still do some climbing up the list. It takes time for the real goodness behind Google to kick in. Plus I didn't think of the idea to shut-off non-surveying topics for visitors until recently -- the idea is to have Google only index the surveying topics so that we'll only attract surveyors, those in closely-related fields or those just interested in surveying.

I don't think that we want to attract every Tom, Dick and Harry in the world unless they happen to somehow be interested in surveying. 🙂

 
Posted : July 30, 2010 6:51 am
(@noodles)
Posts: 5912
 

Relevancy of the BLM Manual

> I wish you the best. If it can generate $$ for you then I wish all the best.
> But do you have to keep Noddles up all night
> at the computer "hitting" the site. 🙂

Who is this "Noddles" that you speak of?? 😉 Tis a new user?? 😛

 
Posted : July 30, 2010 7:21 am
(@deleted-user)
Posts: 8349
Registered
 

Relevancy of the BLM Manual

[Who is this "Noddles" that you speak of?? 😉 Tis a new user?? 😛

I thought it was your evil twin.
🙂

 
Posted : July 30, 2010 7:55 am
(@butch)
Posts: 446
Registered
 

linear regression says 40,023 for 8/04

 
Posted : July 30, 2010 10:17 am
(@noodles)
Posts: 5912
 

Relevancy of the BLM Manual

> I thought it was your evil twin.
> 🙂

Oh no...the world is not big enough for a good :angel: and a bad :angel:. Plus Wendell would wind up pulling all of his hair out trying to decipher me/us. Haha!! 😛

 
Posted : July 30, 2010 10:48 am
(@doug-bruce)
Posts: 72
 

As Kent observed, the rate of decrease is consistently slowing (although it's such a small set of data, it's hard to say whether or not that's a meaningful trend). I ran with that assumption, though, and instead of linear regression I modeled the data with a second-degree polynomial, which might be okay for short-term predictions.

It turned out to be a surprisingly good fit to the four given points. Here's the equation of the best-fit parabola, assuming that the rankings are i.i.d.:

[tex]r=108.886d^2-9355.40d+239271.1[/tex]

where [tex]r[/tex] is the ranking and [tex]d[/tex] is the day of the month of July.

So how well does this function fit the four given data points?

July 26: actual rank = 69640, computed = 69638.0, difference = 2.0
July 28: actual rank = 62675, computed = 62686.9, difference = -11.9
July 29: actual rank = 59554, computed = 59538.1, difference = 15.9
July 30: actual rank = 56601, computed = 56607.0, difference = -6.0

Pretty good. (!)

So where will we be on August 4? In my convention, that's July 35, so

[tex]rank=108.886(35)^2-9355.40(35)+239271.1[/tex]
rank = 45218

This is not much different from Kent's prediction, above, but differs substantially from the linear regression model.

We'll see.

- Doug

 
Posted : July 31, 2010 3:44 am
Wendell
(@wendell)
Posts: 5782
Admin
 

Allow me to further complicate matters:

July 31: 53,893

 
Posted : July 31, 2010 5:26 am
(@kent-mcmillan)
Posts: 11419
Topic starter
 

>the rate of decrease is consistently slowing (although it's such a small set of data, it's hard to say whether or not that's a meaningful trend). I ran with that assumption, though, and instead of linear regression I modeled the data with a second-degree polynomial, which might be okay for short-term predictions.

Naturally, the real prize is for predicting at what level the web traffic rank will approximately stabilize. That has real implications for the value of advertising space, for example.

 
Posted : July 31, 2010 6:35 am
(@kent-mcmillan)
Posts: 11419
Topic starter
 

> Allow me to further complicate matters:
>
> July 31: 53,893

That is exactly the value predicted by the polynomial function Doug gives above. :>
It is an amazingly consistent data set so far. The model Doug derived above predicts that the web traffic rank will stabilize at around 38,300 in two weeks.

 
Posted : July 31, 2010 6:38 am
Wendell
(@wendell)
Posts: 5782
Admin
 

> That is exactly the value predicted by the polynomial function Doug gives above. :>

Oh! LOL

I obviously didn't run it to see. +o(

 
Posted : July 31, 2010 6:54 am
(@butch)
Posts: 446
Registered
 

Beats Sat morning cartoons - Used an exponential function to fit to the curve:

f(x)=a*e^(b*x) + c

a = 57603.4987; b = -0.0694; c = 15898.013

Days (x) Observed...Calc'd...% diff
1...........69640......69640...0.0006%
3...........62675......62677...0.0037%
4...........59554......59542...-0.0205%
5...........56601......56616...0.0272%
6...........53893......53887..-0.0109%
10............?..........44681........?

* 7/26 = day 1 and so on to 8/04 = day 10
** calc'd values rounded to nearest integer
*** above not worth much more than a cup of coffee

 
Posted : July 31, 2010 7:43 am
Page 1 / 3