Hi,
I have a huge data set that I'm trying to use for a recreational project that my dad and I are working on. The data file that I have is a large text file (comma separated values) that contains latitude, longitude, and depth in each row. The problem is that there are about 10 million rows of data! The file is too large for excel to open fully. I can open it in wordpad, but it runs so slow that it is essentially nonfunctional.
Those 10 million rows are bathymetry data, collected in 1996, of the entire Santa Monica Bay. I'm only interested in a small subset of the data. Specifically, I only want the points that lie between 33.95deg and 33.98deg latitude by 118.49deg and 118.52deg longitude. Once I get that subset of data, I'm going to convert it to UTM grid (zone 11) so I can make rough comparisons to nautical charts. Then, I'm going to use my handheld gps to find some of the spots represented in the data and scuba dive in those spots.
My question is: how can I extract that data from a file that is so big that I can't open it with exel?
Here's a link to the data from the USGS, http://geopubs.wr.usgs.gov/open-file/of02-162/data/bathy.zip
Thanks,
Jeff
I found this site to be quite useful. Opened soem rather large LIDAR files foillowing their advice
Got it (I think).
About 1MB unzipped, 193k zipped.
Where do I send it?
Loyal
If you had dbase or Foxpro, you could read a comma delimited.
Ten million records, I am not sure. With one command you
could make a new file with the smaller subset. Then you can
sort by lat or long with the subset file. Import to ACAD
and your maybe your cogo.
Go look for NotePad++
I would be suprised if it can't handle it.
Thanks Loyal! The email address in my profile is current.
I'll also play around with some of the suggestions from the other posters. Thanks for all of the replies!
Jeff
On its way Jeff
34,566 points (if I did it right).
I didn't reply directly to your post, so you should still be able to edit your email address out if you want to.
Loyal
What Excel version are you using? I haven't tried a file of that size but what I know is that Excel 2010 has greater data capacity than 2003.
If Excel 2010 can't help you then you must consult a database guy in your community. Database engines can handle huge amount of datasets with ease. Using SQL, your desired data can be retrieved in no time.
Would be interesting to know how you did it ...
I was thinking of importing in Access and filtering on Lat/Lon ?
Chr.
If you got SQL, it would handle your file. Right on, AM!
There is a neat little programming language called Awk - Gawk is a free public domain version
Its designed to process text files just like this
Christ
The file was a simple three-field comma delimited ASCII text file containing 294,435,351 bytes (see below).
-118.63956,34.02068,-34.272
-118.63941,34.02068,-34.272
-118.63925,34.02068,-34.272
-118.63909,34.02068,-34.272
-118.63830,34.02068,-34.207
-118.63814,34.02068,-34.207
-118.63798,34.02068,-34.207
-118.63687,34.02068,-34.376
-118.63671,34.02068,-34.376
I just wrote a quick BASIC program on my DOS machine and compiled it (took about a minute):
OPEN "i", #1, "bathy.csv"
OPEN "o", #2, "Jeff.csv"
1 IF EOF(1) THEN 4
INPUT #1, Lon#, Lat#, D
IF Lon# > -118.52 AND Lon# < -118.49 THEN 2 ELSE 1
2 IF Lat# > 33.95 AND Lat# < 33.98 THEN 3 ELSE 1
3 WRITE #2, Lon#, Lat#, D
IF EOF(1) THEN 4
n = n + 1
GOTO 1
4 PRINT n: CLEAR : SYSTEM
Moved the executable to this Windoz Internet machine and ran it (about another minute), and BINGO.
It actually took less time to write/compile the program and run it, than it did to download and unzip the original file!
Loyal
It does. Notepad ++ is awesome software.
Christ
Whow,
genius way to solve a memory problem on our Windows MB machines and do it on an old Dos machine with Q basic!
Chr.
I'm using the 2007 version of excel.
I like Loyal's solution.
I've used Awk and it is indeed powerful. It can do the same processing on a bunch of files in one command. But it has a super-condensed syntax that will take you a few tries to get right even if you are a programmer in other languages.
DOS
We use these batch programs for many things, they are very handy. We use them for renaming pics, editing GPS data and many other things as well. One of my favorites is one that runs through a rinex file and will strip out certain time periods, delete sat's, strip cycle slips, whatever you tell it to do.
DOS
I could use some help from one of you, if you are willing please email me at my profile email. Its a small programming challenge that I am working on. Thanks.
Cliff