Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

Geek Culture / Converting .doc to .txt

Author
Message
Killswitch
22
Years of Service
User Offline
Joined: 2nd Oct 2002
Location: School damnit!! Let me go!! PLEASE!!!
Posted: 20th Jun 2004 14:49 Edited at: 20th Jun 2004 14:49
I'm rathr annoyed with .doc's at the minute, first of all I can't read strings from them, becuase the file contains tonns of other data (so all I get is random amounts of, well, crap) but I can't get the text within the file to a .txt which would cut out all the unnessacry rubbish.

I'm looking for a simple way to convert a .doc to a .txt from within my program (as it would be a major downer for a user to have to convert all their .docs to .txts manualy) or a way to shift through the needless amounts of nothing to get the actual words within the document.

Google has been rather poor on the subject, the nearest match I got was a .doc to .pdf converter - that didn't actually tell you how it did it.

~I see one problem with your reasoning: The fact is that is a chicken~
David T
Retired Moderator
22
Years of Service
User Offline
Joined: 27th Aug 2002
Location: England
Posted: 20th Jun 2004 14:51 Edited at: 20th Jun 2004 14:52
Hmm - I *may* be able to write a quick VB program to do the job - that OK?

Two strings walk into a bar. I'll have a pint says the first$%ASLDJ09920D"$"$D. Excuse my friend says the second, he isn't null terminated.
ReD_eYe
21
Years of Service
User Offline
Joined: 9th Mar 2003
Location: United Kingdom
Posted: 20th Jun 2004 14:54
If you could open word, copy the text to the clipboard, close word, then open notepad, copy the text to notepad and save it that might work You can do quite abit of that from within DBP or with external .dll's i think.

David T
Retired Moderator
22
Years of Service
User Offline
Joined: 27th Aug 2002
Location: England
Posted: 20th Jun 2004 14:54
Lol, having tried it seems I can't do it in VB I can do RTF but not word.

Two strings walk into a bar. I'll have a pint says the first$%ASLDJ09920D"$"$D. Excuse my friend says the second, he isn't null terminated.
IanM
Retired Moderator
22
Years of Service
User Offline
Joined: 11th Sep 2002
Location: In my moon base
Posted: 20th Jun 2004 16:03
Try googling for catdoc

*** Coming soon - Network Plug-in - Check my site for info ***
For free Plug-ins, source and the Interface library for Visual C++ 6, .NET and now for Dev-C++ http://www.matrix1.demon.co.uk
Killswitch
22
Years of Service
User Offline
Joined: 2nd Oct 2002
Location: School damnit!! Let me go!! PLEASE!!!
Posted: 20th Jun 2004 17:17
Thanks David T, for trying, and thanks IanM for pointing me to catdoc - there is something out there that can do this!!

So there must be a way to read plain text from a .doc...This is going to bug me for a looonnnggg time I can tell...

~I see one problem with your reasoning: The fact is that is a chicken~
Mentor
22
Years of Service
User Offline
Joined: 27th Aug 2002
Location: United Kingdom
Posted: 20th Jun 2004 18:41
http://www.wotsit.org/search.asp?s=text

try wotsit's, they have references on most of the file formats, knowing the format you can then work out how to strip the extranious data from the file and create a plain text file, cheers.

Mentor.

PC1: P4 hyperthreading 3ghz, 1gig mem, 2x160gig hd`s, Nvidia FX5900 gfx, 6 way surround sound, PC2: AMD 1.2ghz, 512mb ram, FX5200 ultra gfx, stereo 16 bit soundblaster.
Killswitch
22
Years of Service
User Offline
Joined: 2nd Oct 2002
Location: School damnit!! Let me go!! PLEASE!!!
Posted: 20th Jun 2004 19:25 Edited at: 20th Jun 2004 19:30
Mentor the whole site seems to be down, the only way I can get to the pages is by going to the cached page after I search for each page, but I can't download the format file because its not there!!

Hopefully its just down for maintance!

~I see one problem with your reasoning: The fact is that is a chicken~
SonicBoom
21
Years of Service
User Offline
Joined: 26th Nov 2002
Location:
Posted: 20th Jun 2004 20:14
Sounds like what you want to do is to Automate Word?

You need to open up a Word .Doc, read the text from it, close off the .doc and then do something with the text?

I've done this in VB using OLE Automation. The Key thing is that only word can understand a word document, and while mentor has a valid idea, in practice I think you'll have trouble as MS don't give their document file structure away lightly.

If you can do this in VB6 (search google for "OLE Automation Word") then I believe there are plug ins around now so you can write a VB6 dll to do the work and interface with that from DBPro.

If you need dacodez then lemme know but I assure you that once you automate word its no great shakes to grab the text from a document.
Toby Quan
21
Years of Service
User Offline
Joined: 16th Oct 2003
Location: U S A
Posted: 21st Jun 2004 19:03 Edited at: 21st Jun 2004 19:03
I have done this before in VB. You must include Office in the Referencees section. I use Word 97, but it will work in any version. Here is the code:

Pincho Paxton
21
Years of Service
User Offline
Joined: 8th Dec 2002
Location:
Posted: 21st Jun 2004 20:41
Word works in VB, but my version is 95.

Mentor
22
Years of Service
User Offline
Joined: 27th Aug 2002
Location: United Kingdom
Posted: 21st Jun 2004 22:06
@ Killswitch Sheela: maybe you have a popup blocker running, they have popup ads on the site, the files are there, I just started to download one, I will download em and mail them to the addy in your sig (they are zip files...ok?), if they are not what you want then tell me what version of word.doc files you want (theres a lot), while sonic boom has a valid idea, it may be too much to expect customers to have the latest version of word installed just to run your app, plus they keep altering the formats to retain propriatry control of the word/.doc format. so that users are continuosly forced to upgrade if they want to read .doc files from newer systems/installs properly.

Mentor.

PC1: P4 hyperthreading 3ghz, 1gig mem, 2x160gig hd`s, Nvidia FX5900 gfx, 6 way surround sound, PC2: AMD 1.2ghz, 512mb ram, FX5200 ultra gfx, stereo 16 bit soundblaster.
Killswitch
22
Years of Service
User Offline
Joined: 2nd Oct 2002
Location: School damnit!! Let me go!! PLEASE!!!
Posted: 21st Jun 2004 23:20
Thanks for the .zips Mentor! I think this is going to take a long time to read through...but hey...

~I see one problem with your reasoning: The fact is that is a chicken~

Login to post a reply

Server time is: 2024-11-25 12:16:04
Your offset time is: 2024-11-25 12:16:04