Can't upload PDF files

Tagged:  

Hello,

I have file storage setup to use "Database (MYSQL)" and I can upload JPGs but I can't upload PDF files.

Is there a setting somewhere that will allow me to upload PDFs or any file type?

Cheers!
Gary

I have no idea what is going on. We store pdf without problems.

What happens if you use a different browser?
Is PDF in capital or lower case?
Most probably something with mime types as discussed on more than one occassion here.

Thanks for your reply.

I've done various tests this morning to see what might be causing it. PDFs 1MB and over don't attach, after adding the message and attaching the PDF (which doesn't attach and doesn't display a error) I need to click attach which takes me to another screen with a drop down and I can select the PDF which is over 1MB, please see the attached image.

Thanks,
Gary

AttachmentSize
upload-issue.jpg 138.34 KB

I uploaded a 3.17mb ZIP file and the same problem happens. It seems if a file is bigger than 1mb it doesn't automatically attach to the message.

Any ideas :(

Maybe you will find some answers in these posts:

http://www.projectpier.org/node/1618
http://www.projectpier.org/node/2398
http://www.projectpier.org/node/2333

Specifically your MySQL settings seem to be important.

Hi phpfreak,

Thanks again for your reply.

It's weird the files are sitting in my /tmp folder and attaching them manually (using screenshot above) appears next to the message but when you click on the link it produces this error "Requested revision does not exists in the database".

It doesn't appear to be a FTP issue because the files are sitting there?

Does anyone else have this issue - not being able to attach a file over 1mb to a message (but the file still uploads to the FTP)?

Yes

Again, try changing your my.cnf

http://www.projectpier.org/node/2398#comment-6133

This guy also had a 1MB upload limit.

Hi mate,

I asked my webhost to make the changes mentioned in that message but it didn't make any difference.

I can visit www.domain.com/tmp/ and it displays the files that have been uploaded. I don't think this right as it gives everyone access to all my uploads.

I can go direct to the /tmp/ folder and access the file direct but going via project pier displays the error.

I did some testing on my hosted server

1) File system storage

a) upload a 2MB file: no problem
b) upload a 1MB file: no problem
c) upload a 500KB file: no problem

2) Database storage

a) upload a 2MB file: Failed to import file '/home/sharec/public_html/pp088/tmp/7-ZipPortable_9.20_Rev_2.paf.exe' to the file repository (unique file id: fd7684529b5d6cef607d63376b50dc6ae4023cfa)
b) upload a 1MB file: Failed to import file '/home/sharec/public_html/pp088/tmp/Riot.zip' to the file repository (unique file id: a25a6e76db0b741ec4f30d6bbae79db37024c28a)
c) upload a 500KB file: no problem

I checked PHP variables:

max_execution_time 30 sec
max_input_time 60 sec
post_max_size 30M
upload_max_filesize 30M

I checked MySQL variables:

max_allowed_packet 1048576

Trying to set packet size from PHP:

DB::execute('SET SESSION max_allowed_packet=16777216'); // 16 MB
Query failed with message 'SESSION variable max_allowed_packet is read-only. Use SET GLOBAL max_allowed_packet to assign the value'
DB::execute('SET GLOBAL max_allowed_packet=16777216'); // 16 MB
Query failed with message 'Access denied; you need the SUPER privilege for this operation'

So, I am in the same situation as you.

Now we check the value of max_allowed_packet in your server.

You ran the query and this is the result:

max_allowed_packet 1048576

So, we are in the same situation.

You need to ask your hoster to upgrade the packet size in my.cnf

Last weekend I changed the upload and download code and the way files are stored in the database.

PP 0.8.8 will now chop up files into 500 KB parts and put each part in the database. During download a file is nicely download per part. This has 3 direct advantages:

1) PP is no longer limited by the MySQL parameter max_allowed_packet_size (usually 1MB)
2) downloading of files is much more memory friendly on the server side
3) browser can now show download progress giving the end user feedback on progress and duration

Basically, files are streamed down.

In some future version I am planning to support complete streaming (i.e. the ability to download a certain part of a file which comes in handy when playing music or video ;-)

This change also makes me think to drop support for filesystem in some future version. With files in the database backup becomes very simple now. Just backup the database and all is backed up. No more problems keeping the filesystem and database in sync (e.g. when someone accidently deletes files on the filesystem or when either database or filesystem runs out of space but not at the same time). Adding more servers to handle the load is now easy as all user data is stored in the database.

After this I was only limited by PHP limits (max_upload_size, etc.). So, on top of all this I developed a small Java program that runs from the command line and can upload large files to PP bypassing all the limits imposed by PHP. The next step would be to put some user interface on top of it. For the end user, it would work as follows: They drag and drop a file into the PP drop box ;-) and the file gets uploaded. In its simplest form, the drop box would present a dialog asking what project to put the file in (showing a dropdown with all projects).

sounds great ... whenever 8.8. has some beta we can help you to test it on our installation....

Storing files in the database might sound like a nifty idea but I can assure you that it certainly is not scalable. It will work well at the beginning for sure, but as you add more binary data to the db it will get more and more difficult to backup the database, move the database backups around, etc.

Ideally, you want to be able to store the database on local disks and store files on a NAS (or similar). That way your files are protected and its easy enough to rsync a backup into the cloud (or some other place) on a scheduled basis. It's also easy to dump the db and rsync a copy of that some place as well for backup.

At the very least, storage of files in the db should be an option. And if it's presented as an option it should also be undoable. That is, you need to provide a way to migrate between the two methods of file storage. This is exactly what happened in the case of the excellent open source ticketing system OTRS. They started with file storage in the db. Then eventually there were systems in place that were quite busy and the db storage method presented all of the problems I mentioned above PLUS it adversely impacted the performance of the db. So the OTRS crew added file system storage and then a way to migrate between the two systems.

So my word of advice to you is: think how your decision will impact organizations using PP in the long term. IN the real world no one wants to have to migrate between different project systems because you end up losing legacy data and have to retrain everyone.

Finally, you should be able to implement a way to present file upload/download progress using either flash or javascript (jquery, etc.). Doing it 500KB at a time isn't exactly providing accurate progress compared to 1KB at a time.

I welcome the feedback. I will take your words into consideration.

At the very least, storage of files in the db should be an option.
Let's keep both options in then :-)

And if it's presented as an option it should also be undoable.
Doable, just not done before. I basically have no idea how to convert one type of file storage into the other within the time constraints PHP usually gets with hosters (like 10 seconds).
In general there is a request on the 0.9 wish list to export data on project level. I think that is the best way to save data.

Finally, you should be able to implement a way to present file upload/download progress using either flash or javascript (jquery, etc.). Doing it 500KB at a time isn't exactly providing accurate progress compared to 1KB at a time.
I played around with several javascript uploaders but I could not get them stable over browsers. I finally decided not to put any time in those solutions anymore. To overcome the MySQL packet limit I fixed the current code to chop a file up in blocks of 500 KB after the regular upload process ended. This way it is at least possible to get the file into the database and download it in pieces.
Tests with the Java upload in my home environment showed that the highest upload speed (Kbits/sec) was with blocks of 200 KB. Uploading the same file in blocks of 100 KB would take more time to upload as well as uploading the same file in blocks of 300 KB.
1 KB blocks feels too small to me. During the download of the 5 MB test file I could see the progress in Chrome clearly (the circle filled up in nicely). We have to keep in mind that most connections have more bandwidth downloading and less bandwidth uploading. I think I noticed that with my tests: 200 KB up was the fastest for uploading while 500 KB downloading was even faster.

I did a quick scan on Google on storing files in a database and problems. I could not find a lot but did find this:

http://blogs.hibernatingrhinos.com/tags/ravendb
http://www.meetup.com/google-nyc-tech-talks/events/17180173/

At work we do store attachments in the database and although this in the 1-2 terabyte range, we do not have any issues. This is a DB2 database complex on z/OS (mainframe). I will check with the DBA's how they do that. I know there is stuff going like online reorganization and stuff.

but as you add more binary data to the db it will get more and more difficult to backup the database, move the database backups around
After all this, I reflected on the first issue you mentioned. The point is that adding files will bring you to a large size database faster but the same issue holds when you use PP for a long time. I need to spend time on backup strategies:

http://sarahdba.blogspot.com/2009/08/best-mysql-backup-strategy.html

http://www.mydumper.org/

http://www.percona.com/software/percona-xtrabackup/

Mydumper looks great.

Great to see some action on the file storage options. I use the file system (not the DB) as I may need to upload v large files (GB size). Intersetd by storing files in the DB but dont really know the benefits of using a DB.

Here are some other suggestions:

1 upload multiple files in one go

2 put time limits on storage times for certain folders. (eg some folders could be unlimited whearas some folders have a 24 hr time limit)

Cheers

Joe

Hey Joe,

re 1) How do you get GB size files into PP? Usually PHP times out during upload. Did you change these settings and if so, to what values?

re 2) Interesting. Got a feature request that seems to match this: Ability to put an expiry date on any object.

1) I guess the answer is I haven't yet - biggest is about 500mb that I have uploaded and downloaded so far. I use the file system and have set the php.ini with the settings below (found from this forum).

;;;;;;;;;;;;;;;;;;;
; Resource Limits ;
;;;;;;;;;;;;;;;;;;;

max_execution_time =300
max_input_time = 300 ; Maximum amount of time each script may spend parsing request data
;max_input_nesting_level = 64 ; Maximum input variable nesting level
memory_limit = 128M ; Maximum amount of memory a script may consume (128MB)

; Maximum size of POST data that PHP will accept.
post_max_size = 2001M

; Maximum allowed size for uploaded files.
upload_max_filesize = 2000M

2) would be great - ideally when a folder is set up it would give you an option to set file limits when it deletes them - maybe displays a warning on the folder, and a time limit on the files themselves. This is useful for large files that are not wanted to be kept in the project.

I checked with the DBA's at work:

1) Using the regular tools making a backup of 100 GB takes about 15 minutes.

2) For our 1,5 TB database they use a different approach. They copy the physical files that contain the database. Takes about 25 minutes and thus 25 minutes of unavailability.

The type of data stored (documents, binaries) does not matter to them. The only remark is that currently the attachment table has 254 partitions and each partition can store 2 GB of data. If the total data size would grow beyond that a redesign of the table in more partitions would be required.

This is when DB2 v 9 for z/OS is concerned. Checking the documentation of MySQL:


E.7.2. Limits on Number of Databases and Tables
MySQL has no limit on the number of databases. The underlying file system may have a limit on the number of directories.
MySQL has no limit on the number of databases. The underlying file system may have a limit on the number of tables. Individual storage engines may impose engine-specific constraints. InnoDB permits up to 4 billion tables.

But here it gets interesting:

E.7.3. Limits on Table Size
The effective maximum table size for MySQL databases is usually determined by operating system constraints on file sizes, not by MySQL internal limits. The following table lists some examples of operating system file-size limits. This is only a rough guide and is not intended to be definitive. For the most up-to-date information, be sure to check the documentation specific to your operating system.
Operating System File-size Limit
Win32 w/ FAT/FAT32 2GB/4GB
Win32 w/ NTFS 2TB (possibly larger)
Linux 2.2-Intel 32-bit 2GB (LFS: 4GB)
Linux 2.4+ (using ext3 file system) 4TB
Solaris 9/10 16TB
MacOS X w/ HFS+ 2TB
NetWare w/NSS file system 8TB
Windows users, please note that FAT and VFAT (FAT32) are not considered suitable for production use with MySQL. Use NTFS instead.

It finally depends on how the database is mapped to the file system.

My suggestion for the design would be, to be able to archive closed projects at a certain point (together with the autodelete), to bring them out of the active database and if needed compress the database in use (I am not aware if that is needed with sql). This way you can sort out closed projects, it would be good of course to have some tool to still look into these archived data if necessary. I am using mysqldumper for my backups and as I am the typical enduser I feel that this is really easy.

I think breaking up the file is a clever solution. As long as everything goes as planned and the file is reconstructed properly.

But on the one hand, you state that the restrictions imposed on hosted systems would make it tough to implement the functionality to migrate from filesystem storage to db storage...and then on the other hand you use performance metrics from a tuned enterprise system, supported by dedicated staff, to justify the architecture decisions you are making.

So who is this functionality directed at? The enterprise user with dedicated sys admins, replicated database, and robust hardware, or the budget user that needs to install on a $6/mth Godaddy account?

Through experience I can tell you that storing large binary data in mysql is a bad idea.

I think storing files in the database is a bad idea, and is reason enough for me to not consider using PP.

Backups of the database do not become "really simple" when you get into the multi gigabytes. I upload 4 - 30-45 minutes video presentations into project pier and i now have a database size that would normally take me years to reach if just storing metadata and not binary data.

Most databases "out of the box" are tuned for typical situations, storing binary files for a content management system in the DB is an atypical scenario and one that requires special tuning/management of your database which requires special knowledge of that particular DB which only experienced DBA's posses. If it's not done right at the very start (which it likely won't be for your average projectpier user), it's a nightmare. I don't know if any serious CMS that stores it's binary file data in the DB.

http://research.microsoft.com/pubs/64525/tr-2006-45.pdf

These are MySQL specific, but are the tip of the iceberg with regards to issus with BLOB storage:
http://mysqlinsights.blogspot.com/2009/01/mysql-blobs-and-memory-allocation.html

This guy is an expert on MySQL... and he recommends "nay on the large blog storage"
http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/comment-page-1/#comm...

I read it and I think it depends on what you want to read. I see arguments to use database.

What I read is that storing blobs is a bad idea. I agree completely with that. Before 0.8.6 PP handled large files in both file system en database badly. For download it would read the complete file into memory and then send it to the browser. For 0.8.6 i changed the download code to deliver a file from the file system into chunks of 1 MB. That worked nicely.

For database storage this was not possible because the complete file would be stored in a blob. Retrieving the file from the database would allocate too much memory so downloading is an issue. To be able to download a file from the database in chunks i needed the file to be in the database in chunks. That is where I start avoiding BLOB storage.

You see, I do not need BLOBs. I just store the file in small parts in the database. There is no functional need to have the file in 1 piece on the server. You could argue that I mimic a file system this way. A file system stores a large file into blocks of 4 or 8 KB onto disk. I am doing the same in the database.

I think storing files in the database is a bad idea, and is reason enough for me to not consider using PP.
That's okay. PP is also about exploring new paths. Staying on the safe path gives us the same old results. If a safe path is what you are looking for than PP may not be the choice for you. But you are most welcome to join me in exploring new solutions and giving constructive criticism.

I believe you are getting constructive criticism. But you obviously don't want to accept any of it because you've spent time coming up with a solution that is clever, but not practical, and would rather move forward with the clever solution just to be different.

Whether you store a file as one large contiguous BLOB or a series of smaller ones is irrelevant because they all point back to the same problem: you're storing binary data in the database which will affect the performance and maintainability of the database. Add to that the possibility that if just one of your file "parts" becomes corrupted the rest of them all become irrelevant. At the very least, storing a file as one BLOB let's you calculate a checksum on it, which you can then handily store in the database as part of the file metadata. I guess you could do the same for file parts but look at the greater effort involved. Still, at the end of the day, storing the file on a RAID server is about 1000x more practical because it's designed specifically for storing files, and implements a layer of redundancy that is more difficult to replicate within a database architecture. Plus, it is so much easier to routinely rsync your file data out to some place else (either in the cloud or another file server in the same facility).

I applaud you for spending time on moving PP forward. But stubbornly forging ahead and implementing technical solutions that have been proven (in the past, by other projects) to be bad ideas isn't going to speed adoption any time soon. I only popped in here because we once forked PP internally to test a proof of concept and I wanted to see how things are going. That was a few years ago. What we concluded then, was that the level of effort to add features that are desperately needed at the business level was just too big--there are too many necessary features that are missing--so we went with another solution. Sadly, the other popular fork of ActiveCollab 0.7.1 has gone commercial (which is okay, I guess) but they haven't implemented a very useable structure for managing large lists of clients and projects.

IMHO, storing file metadata in the db and then storing the file itself on the file system is the way to go and its easily implemented and extended. This is the model that most architectures use. But I've already harped on that. And like that other poster mentioned: deploying a system that stores files in a database would never ever happen in this office. We've gone down that road before.

If you really want to move this project forward than it makes more sense to implement useful features like time tracking, extensive reporting, a robust API, and a well-rounded ACL. Pay attention to useability. For example, this system would only work well for a small team that doesn't have a lot of players involved; as soon as the number of projects, tasks, and team members starts to increase rapidly the whole thing becomes unwieldy. Case in point: the projects dropdown. with a project list doesn't make sense when you have a lot of projects underway. These are some of the features that PP lacks and makes it just a non-starter for most businesses.

Of course, ActiveCollab already has all of these other things...and yet (AFAIK) they've seen no need to store files in the database. On the flipside, they've also chosen to be incredibly secretive about future developments and their lead developer (the same guy that created the code base for PP) is stubbornly insistent on implementing features *he* deems important and while ignoring those that his customers (or prospective customers) are requesting out of necessity. Weird. Clearly there's still a lot of room for someone coming out with a really awesome self-hosted project management system.

and would rather move forward with the clever solution just to be different.
You have understood me right. I do not want to spend time in the box 'why it cannot be done' but spend more time in the box 'how can it be done'.

Clearly there's still a lot of room for someone coming out with a really awesome self-hosted project management system.Ask yourself why it isn't happening.

Case in point: the projects dropdown. with a project list doesn't make sense when you have a lot of projects underway.Correct. No idea on how to make it better. Using A-9 index is not good for non-Western users. Top projects only does not solve the issue when there are many top projects. Adding an autocomplete is more difficult. Probable most recently used is best.

I am not in argument mode :-) Just exploring and using your remarks to come up with a good solution.

On one hand we have the situation where we have limited hosting accounts and want to provide a solution. That is file system storage. On the other hand we want to support enterprise level situations. That is where either file system storage or database storage comes into play. But storage methods should behave the same (e.g. streaming).

Note 1: There is no official script yet to move from one type of storage to another. One post in the forum here has a solution to move from db to fs.

Note 2: You sensed right, my personal inclination is database storage. I consider file systems archaic :-)

This is about understanding technical limitations and the scalability of this particular solution based on the environment it is most-likely to be deployed on. I'd argue that it is far less likely that a well-funded enterprise IT department is going to look at implementing PP vs a small team at an unfunded startup. Why? Because the enterprise management will insist on a fully supported commercial solution in order to minimize risk.

It's silly to to consider file systems archaic. A database server maintains its data in a filesystem after all. The issue here is specifically the binary data associated with a file. There's no reason to store it in the database itself. Perhaps I should have explained that? You want to store the meta data in the database but then point to the binary data (the file) on the filesystem. You would never use the filesystem, for instance, to generate a list of files...you would just dip into the db for that.

Hmm

Maybe you think it is silly but I would not make it sound like a general statement. Or should I take that remark personally?

I am not the only one in the world that thinks like this:
http://c2.com/cgi/wiki?FileSystemAlternatives
http://blog.druva.com/2009/01/25/file-systems-vs-databases/

It contains statements like this:

As of August 2009, typical desktop systems come with HDDs in the range of 1TB storage. The usage of that storage has increased - but not quite proportionally - for larger multi-media files, higher-quality imagery, and so on. At the 10GB mark, it was getting difficult to find all this data. With two orders magnitude more storage, well... let's just say that nowadays people can't browse all the images and documents they have. It's often easier to look up a PDF on Google and download it again than it is to find it on local disk! We need something other than the FileSystem. Main memory should be a cache of the HDD, and perhaps the HDD itself should, for a large part, be a 'cache' of the network resources one wishes to access, as well as serving as redundant storage for network resources on behalf of other computers. (Only a relatively small fraction is needed for tracking active processes!) This would suggest favoring programming-language that supports principles that support automatic caching of network resources, such as FunctionalReactiveProgramming. (Reactive LogicProgramming is also very promising.)

Anyway, I understood your arguments and will go ahead as planned. I should be able to learn of the mistakes made by others. I am looking forward to solving this issue using replication and other solutions.

Just to bring in the idea in case it´s not being thought of. I would suggest that it is better to split projects if they grow above certain complexity. So could the size problem maybe be solved by implementing an approach where PP can point to different databases at start up (instead of multiple installations). This way you could keep the size always within a managable limit. As the database structure would be similar I would imagine that you than could transfer data (like clients) quite easy from one database to the other if needed.

I did not think of this. It is a nice idea. Use multiple databases to host multiple projects. Keep the shared data in one global database. Shared data would be clients, users, permissions, databases(!) and projects. At project creation time an administrator could assign a database for storing project related data. Administrators could decide to host all the projects in one database (the current projectpier solution, extreme solution 1), every project its own database (extreme solution 2) or group projects in databases (e.g. all from one client or all small ones together). Database backup would be very simple then. With some additional code for copying projects between databases, administrators would be in control of the database size.

Very very nice idea. I want to implement it :-)

I have spent most of my life outside of SQL, so files appeal to me right off the cuff - for obvious reasons. That said, multiple DBs seem quite reasonable after a moment's thought

-- Lets say you have 1000 projects. Are there issues with 1 DB per project? How about on a hosted system?
-- Wouldn't it make archival and/or project deletion easy?
-- Being able to work with hosted systems with large files would be sweet. Think of the lower internal overhead on bandwidth and server space!

I am out of programming for 15 years, but I was originally programming database application. I think a database architecture is very useful and as noted by the main developer file systems also are splitted in clusters somewhere.
So maybe beeing able to split projects into different databases if the application growth beyond a certain limit would be a useful solution.
For users like me finally its important to know what system to use, because it might be a one time decision - either filesystem or database, which might be hard to change later on.