Notifications
Clear all

How to Open Huge Text File

19 Posts
12 Users
0 Reactions
3 Views
(@jeff-wright)
Posts: 80
Registered
Topic starter
 

Hi,

I have a huge data set that I'm trying to use for a recreational project that my dad and I are working on. The data file that I have is a large text file (comma separated values) that contains latitude, longitude, and depth in each row. The problem is that there are about 10 million rows of data! The file is too large for excel to open fully. I can open it in wordpad, but it runs so slow that it is essentially nonfunctional.

Those 10 million rows are bathymetry data, collected in 1996, of the entire Santa Monica Bay. I'm only interested in a small subset of the data. Specifically, I only want the points that lie between 33.95deg and 33.98deg latitude by 118.49deg and 118.52deg longitude. Once I get that subset of data, I'm going to convert it to UTM grid (zone 11) so I can make rough comparisons to nautical charts. Then, I'm going to use my handheld gps to find some of the spots represented in the data and scuba dive in those spots.

My question is: how can I extract that data from a file that is so big that I can't open it with exel?

Here's a link to the data from the USGS, http://geopubs.wr.usgs.gov/open-file/of02-162/data/bathy.zip

Thanks,
Jeff

 
Posted : January 11, 2012 7:29 pm
(@farsites)
Posts: 268
Registered
 

I found this site to be quite useful. Opened soem rather large LIDAR files foillowing their advice

 
Posted : January 11, 2012 7:54 pm
(@loyal)
Posts: 3735
Registered
 

Got it (I think).

About 1MB unzipped, 193k zipped.

Where do I send it?

Loyal

 
Posted : January 11, 2012 8:02 pm
(@paulplatano)
Posts: 297
Registered
 

If you had dbase or Foxpro, you could read a comma delimited.
Ten million records, I am not sure. With one command you
could make a new file with the smaller subset. Then you can
sort by lat or long with the subset file. Import to ACAD
and your maybe your cogo.

 
Posted : January 11, 2012 8:07 pm
 BigE
(@bige)
Posts: 2694
Registered
 

Go look for NotePad++
I would be suprised if it can't handle it.

 
Posted : January 11, 2012 8:15 pm
(@jeff-wright)
Posts: 80
Registered
Topic starter
 

Thanks Loyal! The email address in my profile is current.
I'll also play around with some of the suggestions from the other posters. Thanks for all of the replies!

Jeff

 
Posted : January 11, 2012 8:30 pm
(@loyal)
Posts: 3735
Registered
 

On its way Jeff

34,566 points (if I did it right).

I didn't reply directly to your post, so you should still be able to edit your email address out if you want to.

Loyal

 
Posted : January 11, 2012 8:40 pm
(@amdomag)
Posts: 650
Registered
 

What Excel version are you using? I haven't tried a file of that size but what I know is that Excel 2010 has greater data capacity than 2003.

 
Posted : January 11, 2012 9:31 pm
(@amdomag)
Posts: 650
Registered
 

If Excel 2010 can't help you then you must consult a database guy in your community. Database engines can handle huge amount of datasets with ease. Using SQL, your desired data can be retrieved in no time.

 
Posted : January 11, 2012 9:38 pm
(@christ-lambrecht)
Posts: 1394
Registered
 

Would be interesting to know how you did it ...
I was thinking of importing in Access and filtering on Lat/Lon ?

Chr.

 
Posted : January 11, 2012 9:56 pm
(@paulplatano)
Posts: 297
Registered
 

If you got SQL, it would handle your file. Right on, AM!

 
Posted : January 11, 2012 10:47 pm
(@jimcox)
Posts: 1951
 

There is a neat little programming language called Awk - Gawk is a free public domain version

Its designed to process text files just like this

 
Posted : January 12, 2012 12:49 am
(@loyal)
Posts: 3735
Registered
 

Christ

The file was a simple three-field comma delimited ASCII text file containing 294,435,351 bytes (see below).


-118.63956,34.02068,-34.272
-118.63941,34.02068,-34.272
-118.63925,34.02068,-34.272
-118.63909,34.02068,-34.272
-118.63830,34.02068,-34.207
-118.63814,34.02068,-34.207
-118.63798,34.02068,-34.207
-118.63687,34.02068,-34.376
-118.63671,34.02068,-34.376

I just wrote a quick BASIC program on my DOS machine and compiled it (took about a minute):


OPEN "i", #1, "bathy.csv"
OPEN "o", #2, "Jeff.csv"

1 IF EOF(1) THEN 4
INPUT #1, Lon#, Lat#, D
IF Lon# > -118.52 AND Lon# < -118.49 THEN 2 ELSE 1
2 IF Lat# > 33.95 AND Lat# < 33.98 THEN 3 ELSE 1

3 WRITE #2, Lon#, Lat#, D
IF EOF(1) THEN 4
n = n + 1
GOTO 1

4 PRINT n: CLEAR : SYSTEM

Moved the executable to this Windoz Internet machine and ran it (about another minute), and BINGO.

It actually took less time to write/compile the program and run it, than it did to download and unzip the original file!

Loyal

 
Posted : January 12, 2012 6:18 am
(@dshearon)
Posts: 195
Registered
 

It does. Notepad ++ is awesome software.

 
Posted : January 12, 2012 6:55 am
(@christ-lambrecht)
Posts: 1394
Registered
 

Christ

Whow,

genius way to solve a memory problem on our Windows MB machines and do it on an old Dos machine with Q basic!

Chr.

 
Posted : January 12, 2012 7:17 am
(@jeff-wright)
Posts: 80
Registered
Topic starter
 

I'm using the 2007 version of excel.

 
Posted : January 12, 2012 6:04 pm
(@bill93)
Posts: 9834
 

I like Loyal's solution.

I've used Awk and it is indeed powerful. It can do the same processing on a bunch of files in one command. But it has a super-condensed syntax that will take you a few tries to get right even if you are a programmer in other languages.

 
Posted : January 13, 2012 8:06 am
(@davidalee)
Posts: 1121
Registered
 

DOS

We use these batch programs for many things, they are very handy. We use them for renaming pics, editing GPS data and many other things as well. One of my favorites is one that runs through a rinex file and will strip out certain time periods, delete sat's, strip cycle slips, whatever you tell it to do.

 
Posted : January 13, 2012 8:55 am
(@deleted-user)
Posts: 8349
Registered
 

DOS

I could use some help from one of you, if you are willing please email me at my profile email. Its a small programming challenge that I am working on. Thanks.
Cliff

 
Posted : January 13, 2012 9:06 am